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CHARACTER AND PERSONALITY TESTS 


BY GOODWIN WATSON 
Teachers College, Columbia University 










A review of two years’ progress in a rapidly growing field within 
the scope of sixteen pages imposes severe restrictions. After elim- 
inating studies dealing with the psychogalvanic reflex and other 
possible physiological indications of attitude, the studies in constitu- 
tion and its significance for typology, the clinical studies using 
observation, analysis and case studies of individuals, and most of the 
theoretical discussions of the nature of character, there remain some- 
thing like three hundred contributions. About a third of these have 
had to be omitted from the bibliography. The references omitted 
were those in which the character testing was more or less limited 
and incidental, those in which an author repeats his contribution in a 
second or third publication, and some others which offered no unique 
findings. The names mentioned in the text but not followed by a 
reference to the bibliography refer to such studies as had to be 
omitted. Interested readers can doubtless locate such studies through 
the Psychological Index or Psychological Abstracts. 

Table I presents a general survey of the contributions during the 
past two years.* The largest number of studies have dealt with 
symptom description, these representing about 25 per cent of all the 
publications in this field. Opinion studies and investigations of basic 
temperamental factors each represents about half as many. In the 
following discussion each topic will be treated according to the order 
used in Table I. 


* A few important materials published in 1930 but not listed in the bibliog- 
raphy appearing in the PsycnotocicaL BuLietin for February, 1932, have been 
included in this list, so that the cumulative record may be fairly complete. 
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TABLE I 
DISTRIBUTION OF STUDIES BY Major THEMES 


(Based on 300 publications during 1931 and 1932. Some studies 
appear under more than one category) 


I. GENERAL.. 
Summaries. . Cat 
Theory and technique 


II. Reputation, RATINGS 
Haggerty-Olson-Wickman Scale 
New scales and techniques 
CN ok Sete ee ss any cata Can 


III. Symptom DeEscriPrion 

Neurotic Symptoms . 
es a esa 
Woodworth, etc. 

RRR Ae ae 
sc Geeta ea tile wha shale oe aeemas< 
Newly standardized 

Others . 

Introversion-E xtroversion . 5 ees iaha 
ITs Oates bone een be soa Oh n ean 2 < 
Colgate . 

Freyd. 

Neyman- -Kohistedt . 
Gilliland-Morgan . 

New scales standardized. 
Others. . 

Inferiority . 

Heidbreder . ; vida lew ade ihe 
Newly standardized 
Others. . 

Ascendancy- Submission . ist 

Happiness, Work Satisfaction, ‘ete.. 

. Morat KNOWLEDGE AND OPINIONS...........-¢2: 
Use of previously standardized tests.......... 
Newly standardized 
Others—special purposes, etc. 


. Morat Conpuct 


— 


DO DO PO CwoNNNHAM & WhO Or 


Coéperation......... 
Persistence... . 
Industry. PLUS Sk PREAMP E Oy mpl 
. BEHAVIOR Decne ATION: SHorT SAMPLE TECH- 
NIQUES nee ae 


Preschool and ‘kindergarten. 
Older. . 


. Oprnton STUDIES 
Socio- gece tanita etc. 
Religious. . 
Prejudice, fairmindedness . 
Educational, etc. 
Factors creating and changing opinion 


. Personat Attirupes, Se_r INSIGHT 
Self-Others-Ideal suena 
Miscellaneous. . ta pet lt 
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TABLE I—Continued 


IX. Emorronat ConrTroi 
Word associations 
Pressey X-O 
Neurodynamic age (integration) 
Developmental age 
es as ce uc gin peu tetiend bce 
Annoyance....... 


es RU es bcinids os 0c cb teuebewawetbans 
Suggestibility, susceptibility to hypnosis 
Rorschach test 
Typological discriminations 
Pore. pe Ri ee ag gal ig 
I. <5. « «bin gahindpe unas ren 
Speed of decision 
Downey Will-Temperament 
ES EE EE TOR POE eee 
Es <b gw dias 4) w wOM Keane AeNe 
RO dk Gas os cccuwebeeneehanieses 
= ee ea 
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Comprehensive reviews of character and personality tests have 
been given by Symonds (150), Murphy (106), and Watson (110, 
164, 165). The following reviews deal with special areas; attitudes 
and opinions, Droba (35), Laswell, Sherman (137), Murphy (106) ; 


ratings, Bradshaw (18); physiological indices, Landis (77), Lar- 
son (79); introversion-extroversion, Bailey; religious education, 
Smith and Bathurst, Mayer (99); contributions to typology, 
Pfahler (123), Wertham (172) ; behavior observation, Thomas (153), 
Murphy (107); character tests, Hartshorne (58), Shuttleworth. 
May (98) and Olson emphasize the inadequacy of the trait idea, but 
suggest that behavior samples can be collected through proper control 
of situations and training of observers. Weils (170) stresses the 
connection which ought to exist between our measurement and some 
such objective as energetic, socialized conduct directed toward remote 
ends. Symonds (151) lists two hundred problems needing research. 
Bain contributes to our knowledge of reliability by analyzing ques- 
tionnaire items which show much or little change. Lents and asso- 
ciates (83) compared five methods for evaluating test items, and 
decided that the use of the upper and lower third of a criterion group 
gave the best approach. 

Ratings have played a prominent rdéle in the following studies: 
1, 21, 42, 59, 61, 115, 128, 156, 180, 187; Leal, O’Shea, Walsh, 
McElwee. Bradshaw (18) contributes a new scale of the graphic 
sort with the addition of behaviorgrams in which traits can be 
amplified by brief sketches of conduct. Conrad (25) introduced the 
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element of perspective into ratings by having teachers star the traits 
which were of central importance for the child. Morgenthaler 
showed that the willingness of an individual to accept a trait rating 
is little evidence for its validity. Newcomb (112) points to an inter- 
esting discrepancy between the impression created by direct reports 
of incidents and that which existed in the minds of raters. Wil- 
loughby (179) presents a scale of sixty items emphasizing maturity, 
objectivity, realism, intellectualism, etc. 

The Woodworth, Thurstone, and Bernreuter tests have so many 
items in common that they may well be discussed together (Studies 
20, 27, 28, 29, 30, 32, 35, 41, 80, 101, 133, 141, 142, 147, 163, 168, 
169, 187). Modifications have been prepared by Willoughby (178) 
and Murray (108). Terman (152) shows a reliability among the 
gifted group over five years of .42. Asa rule no relationship is found 
between score for neurotic tendencies in general and other factors 
investigated, such as school success, professional success, art ability, 
PGR, susceptibility to hypnosis, pacifist attitudes, race differences, 
etc. Results with delinquents are conflicting, depending on the 
extent to which intelligence and environment are constant in the 
delinquents and the controls, but the usual result is to show a some- 
what greater neurotic tendency among problem and delinquent chil- 
dren. Papurt (118) criticizes the traditional blank as too long, too 
complex in phraseology, containing irrelevant items, masculine in 
viewpoint, including much duplication. Harvey (60) points out that 
the method of weighting used in the Thurstone scale gives undue 
value to any types of question which happen to occur more often than 
others in the original questionnaire. Allport’s results agree with 
previous findings, that when such a test is administered as a part of 
college entrance testing, there is a strong tendency for subjects to 
minimize difficulties and to present a healthy appearance. 

New tests worthy of mention are the Rogers (127) which yields 
diagnostic scores for personal inferiority, social maladjustment, 
family maladjustment, and daydreaming. The Maller character 
sketches ask an individual to choose among a variety of descriptions 
those which apply to himself. Sakellarion (131) elaborated the 
Tendler scale, giving opportunity to rate forty characteristics of 
emotional life by intensity, duration, and frequency. 

Introversion-extroversion interested investigators in these studies: 
14, 20, 30, 32, 41, 42, 51, 53, 54, 57, 64, 69, 80, 91, 93, 120, 128, 
133, 141, 148, 160, 169, 173; Bailey, Goldstein, Guilford and Braly, 
Hollingworth, Lebensart, McGeoch and Whitely. As with the 
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neurotic indices, these measures are remarkable mainly for their 
vague definition and lack of correlation with anything else. The 
present studies show them to have no significant relation to art ability, 
hypnotizability, choice of friends, ratings on pleasing personality or 
social adjustment, strength of patellar reflex, susceptibility to caffeine, 
perseveration, memorizing ability, physiological reactions, tendency 
to delinquency or school achievement in the elementary grades. 
Harris (57) and Flemming (42) agree that there is a correlation 
between introversion and ability to get grades in college amounting 
to .2 or .3. Sward (148) found priests in training in a Catholic 
seminary more introverted. Guilford and Hunt (53) examined 
McDougall’s idea that the rate of fluctuation in the Necker-Wheat- 
stone cube might be related to introversion, but found it uncorrelated 
with existing scales. The chaos of concepts continues, being well 
illustrated by the assumption of Guilford and Morgan that manic- 
depressives could be used as the criterion for extroversion and 
dementia praecox cases as the criterion group for introversion. 

Heidbreder, Payne (57), Casselberry (21), Smith (141) and 
White and Fenton (175) have each developed a scale for measuring 
inferiority feeling. Smith found this feeling correlated with ratings 
4; delinquency, neurotic tendencies, .6; submission, .7; but not with 
C.A., M.A., sex, socio-economic status, or school achievement. The 
Ascendancy-Submission scale (1, 70, 100, 160, McGeoch and 
Whitely) has little relationship to the Thurstone scale, to suggesti- 
bility within a group, to tendency to be chosen chairman of a group, 
or to memorizing ability. Wang (160) found thirty questions would 
do as well as the total. McLaughlin (100) selected twenty-five cases 
at each extreme and tried to modify them by clinical treatment, 
having more success with the submissives than with the aggressives. 

Sailer (130) found that the happier among five hundred young 
men (reliability of scale .82) were steadier in mood, in better health, 
better adjusted sexually, more sociable, more interested in religion, 
more approving of their work and work associates. Hersey (65), 
using self estimate and observer estimate, found 8 per cent greater 
production during cheerful moods. Kornhauser and Sharp (74) 
found that job insecurity and disagreeable supervisors were the main 
thorns in work feelings. Jasper (69) used a multiple choice test 
showing satisfaction with friendships, success of present economic 
order, etc. Other studies in this area are 22 and 101. 

Studies stressing some aspect of moral knowledge were: 28, 29, 
37, 48, 49, 88, 94, 97, 121, 138, 146, 155, 186, Mathews. The Lincoln 
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and Shields (88) age scale is patterned after the Binet, with eight 
individual tests for each age from six to twenty, ignoring the finding 
of previous investigators that moral knowledge was not primarily a 
function of age. The Tomlin “ Best Thing to Do” test (155) applies 
to grades IV to VIII. Strang’s (146) test of knowledge of social 
usage is intended for junior high school. Shuttleworth’s (138) 
investigation of the movies is interesting in showing sample items of 
misunderstanding about human relations found among frequent 
movie-goers. Several studies repeat the traditional finding that apart 
from relation of each to intelligence, there is no connection between 
moral knowledge and delinquent tendencies. 

Studies dealing with honesty are: 7, 24, 27, 28, 29, 79, 92, 95, 97, 
157, 188, 189, Campbell. Clevett (24) and Tuttle (157) achieve 
positive results in experiments to increase the honesty of children, 
but Zyve (189) found it more doubtful. Luria (92) presents 
fascinating evidence on the disorganization in motor performance 
brought about by the attempt of criminals to hide their guilt. 

Coéperation studies include 44, 76, 95, 97, 102. Arnal and del 
Olmo (3) used Mira’s (102) test involving a dramatic appeal for 
someone to give a blood transfusion for victims of a motor accident. 
Kunze (76) used apparatus (scissors, punch, etc.) connected by 
levers so that subjects had to work in codperation to accomplish given 
tasks. Forlano (44) emphasized the relationship of degree of coodp- 
eration to the type of appeal. Subjects were given opportunity to 
describe their own persistence in Wang’s (159) scale, and to demon- 
strate at Bottstein’s (17) picture puzzles; to demonstrate their will 
power by the voluntary control of breathing (Tashjean). Other 
studies are 28 and 97. Stoke and Lehman (145) measured industry 
by teacher rating, pupils’ time records, and record of reserve books 
used. 

Most of the application of behavior observation using short time 
samples has been carried out with kindergarten and nursery school 
children (5, 10, 13, 82, 87, 104, 111, 140, 161, Goodenough). 
Emphasis has been upon activities, contacts, negativism, talkative- 
ness, etc. Arrington (6) studied reliability, using talking pictures 
seen eight times by three observers. Restlessness was studied by 
Olson (116,117) and Laird, Levitan and Wilson. 

Attitudes on various modern social, economic, and political prob- 
lems were tested in 4, 15, 19, 33, 34, 37, 50, 52, 55, 56, 67, 71, 75, 
83, 86, 105, 133, 154, 163. The most extensive group of scales, that 
published by the University of Chicago Press (154), includes atti- 
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tudes toward God, the law, capital punishment, Chinese, Germans, 
the United States Constitution, prohibition, patriotism, Communism, 
censorship, evolution, Sunday observance, criminals, birth control, 
etc. Harper’s (55) test of opinions and attitudes on international 
questions is the most thoroughgoing and extensive test in this field. 
Biddle (15) developed ingenious measures for susceptibility to 
propaganda, and showed that this gullibility could be reduced by a 
series of nine lessons. Correlation of gullibility and knowledge was 
—.36. Carlson (19) found little relation between information and 
certainty in political opinions. Gladys Watson (163) found one of 
the few differences between probably successful educators and prob- 
ably mediocre educators in the greater liberalism of the former (cor- 
relation .46). 

Terman (152) found the gifted less prejudiced, and Pintner (124) 
found that no other test differentiated good students in a practical 
course in psychological testing as well as the fairmindedness test. 
Droba (34) found among factors contributing to pacifism: being a 
woman, foreign parentage, better education, more liberal religious 
ideas, Socialism, social science training, absence of military service. 
He found (33) that a course on the Negro served to build more 
favorable attitudes. Liberals, in a study by Harris, Remmers and 
Ellison (56) were more intelligent, more apt to be males, tended to 
differ from patterns in politics, had more social science training, and 
were less interested in religion. Moore and Garrison (105) agreed 
in finding higher scholarship among liberals than among radicals. 
Peterson and Thurstone report in one study (122) one of a series 
of experiments in using the movies to change attitudes. More favor- 
able attitudes toward Germans, Chinese, race, prohibition, and less 
favorable attitudes toward crime, war, capital punishment and 
gambling have been built by this technique. Religious ideas have 
been measured in studies 16, 23, 31, 96, 109, 184, and in a disser- 
tation by MacLean. Woodward (184) tested the relationship 
between self-report on such religious attitudes as conservative beliefs, 
prayer habits, church activities, “ getting ” attitudes, growth attitudes, 
service attitudes, and healthy-mindedness, and such emotional pat- 
terns as sense of adequacy, self-consciousness, insecurity, worries, 
guilt, shame, dependence, family cooperation, rebellion in childhood 
and adult life. 

Comparing what an individual himself feels about a variety of 
moot questions and interests with what he thinks he ought to feel 
and what he thinks most other people feel, it is possible to deduce a 
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considerable variety of relationship scores such as criticism of one’s 
self, criticism of others, feeling of difference, insight into others, etc. 
Sweet’s test employing this principle was used in studies 7, 29, and 
149. Tyler (158) used the same principle with college students, 
finding, as did Sweet, high reliability but little relationship to success 
in school. Shaw (136) used estimated grades and test scores, aware- 
ness of adjustment patterns in one’s self, ability to write a self- 
description that agreed with the expectations of others, as measures 
of self-insight; intercorrelations among several measures were less 
than .2. Wolff (182,183) analyzes the difficulty which people have 
in recognizing their own voice, profile, hands, gait, imaginative 
stories, handwriting, association responses, etc. 

Allport and Vernon (2) offer a test of values supposed to cor- 
respond to Spranger’s six culture types. Pintner’s school opinion 
test was used in study 7. A test of interests differentiating the 
masculine from the feminine was used in study 152. Hawthorne (62) 
created a multiple choice test in which one alternative was always 
particularly bloody and violent. Washburne (162) secured three 
wishes from each of a thousand persons and compared the well- with 
the mal-adjusted. Wishes collected anonymously did not seem 
noticeably franker. 

The Kent-Rosanoff or some similar test of word association was 
used in studies 21, 63, 80, 90, 103, 133, 172a, 185; McElwee, Powers. 
Luh (90) established norms for Chinese children. Wheat (172a) 
standardized responses of 1,300 children to 25 of the commonest 
words in English. 

The Pressey X-O Test appeared in studies 28, 30, 42, 43, 114, 
160; McGeoch, E. Olson. Flemming (42) suggests a_ revised 
scoring, using five degrees of response to each term, scored to give 
intensity, deviation beyond one P.E. of group, and consistency on 
repetition. Reliability was .7, but with no relation to grade. Despite 
the author’s repeated suggestion that quantitative score was meaning- 
less and qualitative analysis of responses the main value in the test, 
none of the studies has made use of this feature and none has found 
any significant relationships. 

Luria’s suggestion (81, 84, 92, and other studies) that personality 
should be measured in terms of neurodynamic age seems to be an 
important contribution. This measure represents the degree of inte- 
gration and organization in the psychomotor processes. It is disor- 
ganized under threats; at its best it can use tools and other external 
aids in the service of integrated processes. No adequate quantitative 
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test has been proposed, but results are based upon observation in 
experimental situations. Kellogg’s study on steadiness during 
emotional situation, also Olson and Jones (114), have taken slight 
steps in the same direction. 

Developmental age is involved in Furfey’s (47) revision of his 
own test, in Plechaty’s (125) adaptation for girls and Weber’s 
revision of the Wells test. Resistance to distraction appeared in 
studies 28 and 97, reaction to irritants in 46. 

The phase of temperament which has attracted most attention has 
been suggestibility and hypnotizability. Despite Murphy’s (106) 
splendid discussion of the variety of different behaviors commonly 
grouped under this catch-all term, none of the studies (8, 30, 70, 73, 
78, 144, 160, 173, 174, 177, Baumgarten, Shands) have taken such 
distinctions into account. The principal techniques were body sway, 
Aussage, sensory illusions, progressive weights, accepting names 
offered for ink blots, susceptibility to majority opinion, and inability 
to recognize errors in familiar words and forms. No important rela- 
tionships of any of these behaviors to anything else was discovered, 
except for a correlation of —.4 between intelligence and willingness 
of small children to accept ink blot names proposed by an adult. 

The most international response has been given on the Rorschach 
test (11, 36, 45, 85, 87, 91, 113, 132, 176). In addition to the study 
of individual cases, these studies offer norms for feebleminded, 
delinquents, manic-depressives, and suggestions for use in differenti- 
ating leaders from followers, the mentally abnormal from dissem- 
blers, psychogenic from traumatic psychoses. 

Ewen finds a more rapid fluctuation of the ambiguous figures 
(cube, staircase) among schizophrenics than among cyclics. 
Herwig (66) explores “work type” in the Continental manner, 
noting qualitatively how the individual reacts when asked to work 
under pressure, to carry out monotonous tasks, etc. Langelued- 
deke (78) used the Aussage test and found the schizophrenics more 
suggestible, the psychothymics more responsive to color. Reiter and 
Stertzinger proposed to use the tachistoscope, as has been done by 
Kroh and Pfahler (123) to demonstrate the concentration and nar- 
row range of attention in schizothymics as contrasted with the fluc- 
tuation and wide range in cyclothymics. Ritter emphasizes the same 
distinction between those who react to form and those who react to 
color previously presented by Kretschmer, Enke, Pfahler, and others, 
and identifies the color response with Jaensch’s integrated type. 

A temperament test of extraordinary promise has been developed 
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by Stephenson (143) to correspond with the p-concept of Spearman, 
perseveration. Babcock (7) found this test the best of all her list 
for differentiating delinquents from non-delinquents. A similar 
psychological factor was involved in tests used in studies 68, 126, 129, 
Line and Kaplan. 

The Downey Will-Temperament test, due to many studies of its 
unreliability, has at last practically disappeared from the litera- 
ture (26), but some of the same concepts are being expanded into 
more reliable tests. Cowley has elaborated a variety of tests of 
speed of decision, and Duffy (38,39), Freeman and Katzoff (46) 
found interesting results on muscular tension during psychological 
processes. Duffy found a correlation of —.7 between intentional 
social contacts of a small group of nursery school children and their 
muscular tension during discriminations. 

The only test devised particularly for the study of delinquents 
during the period of this review is that of Schwartz (134) which 
presents eight pictures of boys and girls in situations suggesting 
delinquency, but this test is not quantitatively scored. A favorite 
pattern for dissertations, etc., is to give a battery of tests to delin- 
quents and non-delinquents (7, 21, 28, 29, 62, 80, 142, 186). Unless 
intelligence and home background are controlled, these are always 
found to be differentiating factors. Often there are more symptoms 
of maladjustment on self-report questionnaires. Recreational inter- 
ests are found to differ. The above studies also showed delinquents 
manifesting more perseveration, a more conventional idea of the 
right on the Sweet test, more worries on the Pressey X-O, more 
cruelty on the Hawthorne test, more typically delinquent associations 
on the Casselberry and Laslett list, more inferiority on the Smith 
scale. No differentiation was found between groups compared by the 
use of the S-A lying test, a cheating test, the Pintner school opinion 
test, other scores on the Sweet test, moral knowledge test, deception 
test, Otis suggestibility, story inhibition test, visual acuity tests, 
mechanical ability tests, etc. 

Partridge (119) developed an admirable technique for discover- 
ing leadership in boys. A Scout troop was divided into a series of 
small groups such that all combinations appeared, and each group 
chose its own leader for each of a variety of games. Vote for troop 
leader was also used as a measure. A combination of these indices 
correlated nearly .8 with intelligence. Cowley (26) studied persons 
in positions of leadership in the army and in college, and found them 
rating themselves higher in self-confidence, scoring high on the 
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Downey test in motor impulsion and speed of decision. The Moore- 
Gilliland test of aggressiveness applied by Jenness (70) showed no 
relationship to suggestibility of students in the group, and no relation- 
ship to a tendency to be chosen as group chairman. Luithen had 
subjects cooperate in building sentences from words, writing stories, 
arranging pictures, solving jig-saw puzzles, carrying out a disagree- 
able task, packing a trunk, reading an instrument, meeting unantici- 
pated obstacles, etc. He was not interested in a numerical score, but 
in the analysis of the human relationships, and especially with refer- 
ence to various type combinations. 

In German psychotechnics it is fairly common to set up test situ- 
ations which approximate the working situation. One mentioned by 
Herwig (66) which has possibilities for further use in personality 
testing is the ability of an applicant to handle a ticklish situation 
represented by a standardized telephone complaint. 

There is every reason for encouragement in the progress which 
is taking place. The most unfortunate feature is the large number 
of studies in which some self-description blank for symptoms or 
opinions is used, with little attention to the meaning of the test for 
the problem and with little attention to the conditions of administra- 
tion from the standpoint of securing frankness and cooperation. 
Such tests seem to be often given because they are easy to give. The 
results of these endeavors are, of course, slight. The promising 
feature of our present situation is the attention being given to the 
careful observation of conduct in more or less controlled situations. 
Ratings come to be more and more observations of actual behavior, 
and less and less based upon imaginary traits. Test situations per- 
mitting actual conduct that is honest, coOperative, persistent, sug- 
gestible, organized, tense, self-controlled, etc., are multiplying. Some 
good thinking and experimenting is being directed toward the dis- 
covery of differences in underlying temperament. There is a slight 
increase, although not yet approaching the most desirable point, of 
studies in which groups of persons are examined in their interrela- 
tionship, recognizing that character does not exist in isolated indi- 
viduals, but is a function of a social situation having a certain 
structure. 
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INTELLIGENCE TESTS 


BY RUDOLF PINTNER 
Teachers College, Columbia University 


The last summary of this field appeared in this journal in 
February, 1932, and covered the references for the year 1930. The 
present summary attempts to cover the articles on intelligence tests 
for the years 1931 and 1932, but it does so in a much abbreviated 
form, because of limitations of space. The writer has collected over 
330 references, and the printing of these alone would fill his entire 
allotment of space in this number. He has, therefore, been forced to 
make a rigid selection. He has omitted reference to chapters on 
intelligence testing in text-books, and also many articles dealing with 
the relationship between character and intelligence tests, as well as 
other articles in which the intelligence testing seemed to be of 
secondary importance. Intelligence tests are now given as a matter 
of routine in many investigations. They are regarded as funda- 
mental for the equating of groups in educational experiments. Such 
references have been omitted. Furthermore, the writer has attempted 
to select for mention by name in the text only a few of the more 
important articles in each phase of the work. The others are men- 
tioned by number. 

General. Pintner (168) has revised and greatly enlarged his 
book on intelligence testing, bringing his bibliographies at the end of 
each chapter up to date. No other book dealing exclusively with 
intelligence tests has appeared, but chapters are to be found in books 
dealing with measurement in general, e.g., 216, 179, 208, 83. 
Specific problems are discussed by various authors in Murchison’s 
Handbook of Child Psychology (152), and there are chapters in 
Murphy and Murphy (153) and Scheideman (188). A_ general 
survey of the work in America for German readers is given by 
Lietzmann (124). Binet’s general contribution to testing is discussed 
by Bertrand (16) in his life of Binet. Historical and bibliographical 
items are also found in 102, 29, 172. 

Discussion as to the meaning of intelligence still centers mainly 
around the Spearman Two-Factor Theory or else degenerates into 
arm-chair definitions of intelligence and criticisms of the tests: 198, 
199, 236, 228, 215, 143, 209. Conrad’s (40) criticism of the inade- 
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quacy of present tests for adults is good, particularly so because he 
follows it up with 24 requirements for the ideal intelligence test for 
adults. 

Relation of Intelligence to Other Factors. Correlations between 
intelligence tests and character tests, personality tests and ratings are 
now becoming numerous, ¢.g., 109, 171, 70, 79, 69, 21, 154. Most of 
these comparisons refer to college students and with such homogene- 
ous populations the correlations are mostly about zero. There is a 
slight tendency for intelligence and introversion to go together. 

There are many studies which give the relation between intelli- 
gence and other factors, ranging all the way from transfer ability in 
arithmetic to attendance at motion pictures. Many of these studies 
are of little or no value to the student of intelligence testing, but they 
show how an intelligence test of the subjects of an experiment is 
coming to be regarded as a necessity. Some of these studies are of 
great importance in their implications. The “ whole-part” problem 
in learning must be gone over again from the point of view of the 
intelligence of the learners, for McGeoch (144) finds that the LQ. 
is a factor conditioning the relative efficiency of the whole, progres- 
sive part and pure part methods. Again, Overman (159) substanti- 
ates what Thorndike has already pointed out, namely, that amount 
of transfer from one function to another is mainly dependent upon 
amount of intelligence. The other factors, with which intelligence 
has been correlated, can mostly be recognized from the titles of the 
following references: 147, 28, 127, 115, 233, 35, 191, 19, 229, 50, 
113, 211, 27, 30, 125, 85. 

Growth and Constancy of Growth. The most important contri- 
bution to our knowledge of the growth of intelligence is by Miles 
and Miles (150), who find a slight drop from the twenties to the 
forties and then a much more rapid drop to old age. Further discus- 
sion is found in 225, 235, 11, 106. 

Constancy of growth by means of re-test correlations, which 
range from .72 to .91, are reported by 18, 175, 122, 58, 189, 155. 

Cattell (37) demonstrates the effect on the re-test correlation of 
different intervals from 3 to 72 months. Even when different group 
tests are used for re-tests, the I.Q. is surprisingly constant according 
to Lincoln and Wadleigh (126). 

Influences upon Intelligence Ratings. Jones et al. (112) discuss 
the handicap of rural environment on the Stanford-Binet and some 
performance tests. Freeman (74,75) takes up the influence of 
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speed and Snedden (196) the influence of practice. Smith (195) 
finds that severe illness during infancy retards later development. 

Some Technical Questions. Factor analysis by the tetrad-differ- 
ence method is discussed by Garrett and Anastasi (78), and is made 
use of by Peatman (162) and Brolyer (25,26). The problem of 
test-item analysis is thoroughly discussed and demonstrated by 
Brigham (23) and by Barthelmess (12). Foran (71) discusses 
various methods of measuring validity and Furfey and Muehlen- 
bein (77) discuss the validity of tests for infants. No definite effect 
of the fore-exercise is found on test reliability by Egan (65). Relia- 
bility of the Goodenough for subnormal children is reported by 
McElwee (142). Thomson (206) discusses problems of standardi- 
zation, and Cattell (38) the equivalence of 1.Q.s on different tests. 

Individual Scales and Tests. Several new scales have appeared: 
The Minnesota Pre-School Scale (Goodenough et al., 86), the 
Merrill-Palmer Scale (Stutsman, 204), the Randall’s Island Per- 
formance Series (Poull, 178), the Passalong Test (Alexander, 3), 
an object fitting test (Atkins, 8), a short oral test (Kent, 116). 

The Stanford-Binet has been revised for use in India by 
Richey (182), and a Hindustani Binet Performance Scale has been 
constructed by Rice (181). 

Other references dealing with well-known scales, or with com- 
parisons between such scales are: 107, 226, 67, 231, 9, 166, 132, 163. 

Group Tests. New group intelligence tests have been constructed 
by Henmon and Nelson (99), by Pintner (174), by Sleight (194), by 
Otto (158) in Germany. Revision and modifications of existing 
tests are reported in 219, 149, 217. 

The School Pupil. Homogeneous grouping is still being dis- 
cussed: 205, 44, 20. Testing in German schools (56, 161) seems to 
be slowly following the American pattern. The Germans are slowly 
moving toward greater objectivity in scoring and toward being 
sceptical of the subjectivity of teachers’ marks, which they still rely 
upon to a much greater extent than we do in this country. Australian 
testing is following the American pattern, as illustrated in the inter- 
esting report by Wyndham (237). Testing in England is illustrated 
by Amos (4). 

The excellent reports of the Educational Records Bureau (61, 62, 
63) give us a good picture of the high mentality of the private school 
child. The Gray and Ayres (89) report finds a mean I.Q. of 114 
for 601 private school pupils. 

Extensive surveys of high school pupils in various states are 
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reported in 98, 81, 82, 45. Miscellaneous studies of elementary and 
high school pupils are to be found in 114, 202, 121, 97, 72, 57, 66, 92, 
117, 103. 


The College Student. Nothing particularly new has appeared in 
this field, although there are many reports of interest. With refer- 
ence to teachers’ colleges, it should be noted that Upshall and 
Masters (213) find a correlation of +.33 between Thorndike Intelli- 
gence Score and practice teaching. Most other workers have found 
no correlation between these two factors. However, success in the 
field and intelligence is zero, but practice teaching and success in the 
field is only +.27. The other references to college testing are 157, 
33, 76, 227, 110, 54, 223, 46, 22. 

The Superior. There are four studies in this field dealing with 
miscellaneous problems, but adding nothing essentially new: 34, 53, 


101, 180. 


The Feebleminded and Dull. A very good study of the mortality 
of the feebleminded has been made by Dayton (51), who finds a 
high rate for idiots and imbeciles, but for morons a rate only slightly 
higher than that of the general population. Birth injury may be a 
causative factor in mental deficiency to a greater extent than we have 
hitherto believed, according to Doll et al.(55), who have made a 
detailed study of twelve cases. McGhie and MacPhee (145) give 
the median M.A. for various occupations in an institution, and 
Town (210) follows up the feebleminded who have been released 
from an institution on parole. She finds that very few make satis- 
factory adjustment and she is very critical of this method of placing 
out the feebleminded in the community. A questionnaire survey of 
588 special classes by Witty and Beaman (232) covering eleven 
thousand children shows the average I.Q. to be 63. Other refer- 
ences dealing with the feebleminded and dull are 91, 111, 120, 64, 2, 
52, 14, 220, 160. 

Delinquent and Problem Cases. The report by Butcher et al. (31) 
brings out the fact that many delinquent boys have brothers brought 
up in the same environment, who are not delinquent. The median 
I.Q. for the problem cases is 75, while the I.Q. for their brothers is 
86. Interest is shifting to the difference in non-intellectual traits 
between delinquents and non-delinquents, as exemplified by the 
studies of Daniel (47) and Babcock (10). The latter finds her 
delinquent group to be more inferior on verbal than on non-verbal 
intelligence tests. Ackerson’s (1) report on five thousand consecu- 
tive cases at the Chicago Institute of Juvenile Research contains 
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much important information. Other studies in this field are: 73, 224, 
59, 197, 13, 140, 141, 148, 5, 105. 

The Handicapped. The White House Conference (222) devotes 
a book to all types of handicapped children, which gives some data 
on intelligence test results. The deaf are dealt with by Pintner (169), 
who shows that young deaf children just beginning school can now 
be tested in groups by means of his new Primary Non-Language 
Test. Shirley and Goodenough (192) report on most of the deaf in 
Minnesota schools, finding a median I.Q. of 88 on the Goodenough 
Test and 98 on the Pintner Non-Language. The peculiarity in this 
report is the eccentric behavior of the deaf children in public schools, 
who score abnormally high on the Pintner Non-Language and abnor- 
mally low on the Goodenough Test. The children in the state resi- 
dential school score about the same on both tests. 

For the first time we now have some reliable information on the 
intelligence of the hard of hearing child, which seems to be slightly 
below that of the hearing child. This result is reported both by 
Madden (131) and by Waldeman et al.(214). The mean I.Q. of 
1,480 crippled children is 85 according to Witty and Smith (234). 
Lee (123) finds a similar I.0., and Winkler (230) in Germany finds 
them retarded in intelligence as compared with ordinary school 
children. 

Racial Comparisons. Studies in this field continue to be numer- 
ous. Garth (80) devotes a book to the subject and summarizes much 
of the work already done. He tends to explain away differences 
between racial groups and assumes that the I.Q. is greatly affected 
by schooling. Porteus (176) contributes an interesting picture of 
Australian aborigines. Few of our tests seem to be adapted for 
testing such people. On performance tests, which seem best adapted 
to them, the scores for the Australians are extremely low. Kline- 
berg’s (118) study is notable, inasmuch as it is one of the very few 
studies making racial comparisons by means of non-language tests 
in the country of origin of the groups tested. He attacked the Nordic 
myth by testing Nordic, Mediterranean and Alpine groups in 
France, Germany and Italy. He could find no Nordic superiority. 
Pintner (173) presses further the problem of language handicap 
among children from bilingual homes in this country. He shows that 
they are handicapped even on non-verbal tests with simple English 
directions. 

Seven reports (185, 186, 137, 139, 135, 49, 95) deal with tests 
of Spanish-speaking, mainly Mexican, children in this country, show- 
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ing in general rather low I.Q.s. Two reports by Hastings (93, 94) 
give the results of the Army Beta for secondary schools in Mexico. 

Seven reports (164, 39, 136, 146, 15, 48, 100) deal with tests of 
negro children, confirming in general previous results of negro-white 
comparisons. 

A very convenient summary of the intelligence testing of Jews 
is given by Maller (134). Kolb’s (119) report of the use of non- 
language and performance tests with would-be immigrants in Europe 
is valuable. Other miscellaneous racial studies are: 151, 96, 221, 184, 
128, 177, 129, 138. 

Employment and Guidance. A survey of the intelligence of 
nurses in training in Canadian hospitals is reported by Weir (218). 
The mean I.Q. for 2,280 cases is 98.3 with a range from 65 to 125. 
The author comments on the danger to the public of so many nurses 
of low grade intelligence. McPhail and Joslin (130) give results 
for about three thousand nurses in the United States. The value of 
mental tests in an extensive guidance program carried out by the 
Institute of Industrial Psychology in London is reported on by 
Earle (60). Other references in this field are: 87, 17, 108, 212. 

Sex Differences. Armstrong (7) compares boys and girls on 
three intelligence tests and finds no differences in means or sigmas. 
St. John (200) compares a group of boys with a group of girls of 
equal intelligence and finds the boys 7 per cent worse than the girls 
in grade progress in school. 

Inheritance. Three studies deal with identical twins, New- 
man (156), Carter (36), and Stern (203). Sanders (187) reports 
on two sets of triplets, and Brintle (24) on a set of quadruplets. 
Three studies deal with various aspects of social status: 84, 193, 6. 
Burks and Tolman (32) do not find that siblings, who resemble each 
other physically, are more alike in intelligence than those who do not. 
Conrad (41) points to the dangers of comparing correlations from 
mental with correlations from physical traits. 

Miscellaneous Topics. Steckel (201) finds that parental age 
influences the 1.Q. of offspring, but Finch (68) does not, and criti- 
cizes Steckel’s data. Thurstone and Jenkins (207) find that order 
of birth influences the I.Q., the I.Q. becoming higher with the later 
born, but Hsiao (104) in a very thorough comparison of first and 
second born children finds no difference in intelligence and casts 
doubt on the influence of order of birth on intelligence. He shows 
the complexity and difficulty of the problem. Maller (133) reports 
a correlation of —.21 for intelligence and size of family, but Conrad 
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and Jones (43) find no such correlation in certain rural areas. 
Pintner (170) presents data for intelligence and month of birth. He 
finds no reliable differences between months and seasons, but the 
mean for the cold months is lower than that for the warm months. 
Other references on miscellaneous topics are: 88, 90, 42, 165, 183, 
167, 190. 
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STUDIES OF THE TRUE-FALSE EXAMINATION 
BY LUCIEN B. KINNEY anv ALVIN C. EURICH 


University of Minnesota 


When and where tests including true-false statements were first 
used appears to be unknown. Ever since attention was focused upon 
objective examinations, the true-false type has maintained a rather 
dominant place. It has been casually inspected, moderately advo- 
cated, severely criticized, modified in form, and duly evaluated. 
However, final judgment in regard to its serviceability as a measur- 
ing instrument must rest upon the results of careful, scientific 
scrutiny. To provide a comprehensive picture of the available 
evidence is the scope of this review. Obviously, all the investigations 
are not equally meritorious but the composite results portray a more 
significant trend than can be derived from the generalizations of a 
single study. 


THE VALUE OF THE TRUE-FALSE EXAMINATION 


A number of the earliest studies dealing with the characteristics 
of the true-false examination were concerned principally with its 
merit as a device for measuring information. After giving true-false 
tests to graduate and undergraduate students registered in educa- 
tional psychology classes, Gates (26) concluded that the true-false 
test has a number of advantages as a measure of achievement. 
Among these he mentioned economy of time, amount of ground that 
can be covered in a test, and its value as a teaching instrument. 
Knight (34), with graduate and undergraduate students in college 
physics, reached a similar conclusion. Batson (5), working with 
undergraduate classes in education, concluded that the true-false test 
could be substituted for the essay examination. Weinland (71) has 
pointed out the value of the true-false examination when used as a 
check on instruction. By a tabulation of the percentage of incorrect 
responses to each question the need for emphasis in teaching may be 
located. Cocks (17) compared the true-false test with several other 
types in respect to the amount of information imparted to the pupils 
in the process of testing. He found that the true-false test had the 
most “ pedagogical value.” 
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A considerable number of experiments have been carried out to 
compare the reliability, validity and discriminative ability of the true- 
false test with the other objective types and with the essay test. 
Since these studies have been included in another summary’ they 
will not be repeated here. In general, when the true-false test was 
compared with the recall, multiple-choice, and the essay types it was 
found to be inferior to the two objective tests but at least equal to 
the essay examination. 

The possibility that inaccurate statements are fixed in the minds 
of the pupils by the false statements of the test has been investigated 
by Remmers and Remmers (52) and by Roberts and Ruch (55). 
The former used two groups. An unfamiliar reading passage was 
followed in Group I with a true-false test on the subject-matter of 
the passage, and in Group II with a recall test. After four weeks 
the groups were retested in the opposite manner. Since the mean 
score on the final test for Group I was higher than for Group II, no 
permanent residue of misinformation from the true-false test was 
apparent. 

To test the possibility that there may have been a certain amount 
of misinformation balanced by an equal or greater amount of correct 
information, Roberts and Ruch gave a completion test at the outset 
of a similar experiment, followed by a true-false test. The items on 
the final recall test, given at a later date, were checked individually 
to discover the changes resulting from the true-false test. .A slight 
negative effect from the true-false test, which was more than 
balanced by the positive effect became evident. 


THE TECHNIQUES OF CONSTRUCTION 


Weidemann (68,70) studied 175 examinations and standard 
tests, containing 17,047 true-false items, to investigate the factors 
of importance in preparing and administering the true-false exam- 
inations. As a result of his study, Weidemann published his recom- 
mendations on administrative details and language considerations in 
connection with this type of test. 

The true-false questions, numbering over 10,000, that were sub- 
mitted for the nation-wide contest conducted by Ruch and Rice, have 
been studied by Brinkmeyer (11) and by Brinkmeyer and Ruch (12), 
with particular reference to language difficulties that may influence 
the responses. Brinkmeyer investigated the questions to determine 


1“ A Summary of Investigations Comparing Different Types of Tests.” 
School and Society, 1932, 36, 540-544. 
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whether any relationship existed between the word length and the 
truth or falsity of the statement. While short statements were found 
to be almost equally true or false, long statements (twenty words or 
over) tend to be true in about two-thirds of the cases. 

Brinkmeyer and Ruch tabulated the distribution of determiners 
(clues afforded by the phrasing or length of statement that tend to 
determine the pupil’s response in the absence of knowledge). They 
compared their results with those of Weidemann (68). In both 
instances it was concluded that the control of specific determiners is 











desirable and necessary, and must be accomplished by careful study 
of their use. For example, Brinkmeyer and Ruch found that two 






out of three cause or reason, and three out of four “always” or 





“never” statements are false. 
Walker (67) has proposed and answered by mathematical 
analysis several questions of technique in the construction of true- 







false tests. Among them are: (a) Should exactly half of the state- 





ments be true and half false? (b) If not, how much deviation is 
(c) Should the score be corrected for guessing ? 










permissible ? 





CORRECTIONS FOR GUESSING 










The last question proposed by Walker is one that has given rise 
to much discussion and some experimentation. The procedure for 






correcting the true-false test scores for guessing, as described in 
McCall’s How to Measure in Education, was seriously challenged in 
1922 by Hahn (29) and by Chapman (13). The former criticized 
the scoring technique because it was based on the assumption that 






the pupil guessed an even number of times, that half of the guesses 





were correct, and that every wrong item was regarded as a guess. 
The latter, using a hypothetical situation, showed how the correction 







for guessing might lead to a serious injustice. 

These criticisms led to discussions by Barthelmess (3) and by 
Odell (47) who pointed out that the theory of probabilities applies 
only to large numbers, and that the correction gives the most probable 
number of correct answers. Asker (2) showed that if P represents 
the probability that an event will happen, and Q that it will not 
happen, then (P+Q)" expanded by the binomial theorem will give, 
in the terms of the series, the probability that it will happen n times, 
n—l times, etc. Using this principle, he arranged a table to show the 
probability that a pupil will guess a certain number of items correct 
in any given number of trials. Richards and Kohs (54) used a 
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similar method to set up tables of probabilities including four- 
response types of tests. 

Holzinger (30) presented a mathematical proof of the rather 
obvious fact that if all pupils are allowed to finish all the items, or if 
omissions are counted wrong, no difference in ranking is brought 
about by a correction for guessing. 

Ruch (57) has pointed out that the mathematical justification of 
the correction lies in the assumption that any response not based on 
correct knowledge is a sheer guess. He believes that this is rarely 
the case. He adds that the following categories of items may be 
present in the ordinary test: 


a. Items on which the pupil is absolutely sure. 
b. Items on which he is not absolutely sure, but is not in serious 
doubt. 
Items on which there is a grave doubt. 
Items on which he is totally ignorant, and his response to 
which is a sheer guess. 
e. Items on which the pupil is misinformed. 


In view of these various possibilities, Ruch believes that the value 
of the correction is to be settled by experimental technique. 


Several studies have been reported which tend to substantiate 
Ruch’s point of view on the lack of a clear-cut distinction between 
the mental state of ignorance and certainty. Boyd (7) required 
ninety college students to record the state of knowledge on each item 


‘ 


of a true-false test as “ certain,” “doubtful,” and “ guess.” From a 
comparison of the proportion of mistakes in each sort of item, he 
concluded that there is no sharp distinction between states of knowl- 
edge. Brinkmeyer and Keyes (10) used fifty statements that were 
obviously true, twenty-five plausible but false, and twenty-five 
obviously false. The pupils were required to put a question-mark 
after statements on which they were guessing. It was found that 
of the 94 high school pupils and 86 university pupils who took the 
test, those not guessing were correct in 81 per cent of the cases, and 
those guessing were correct in 70 per cent. The authors concluded 
that guessing is so universal that pupils are unable to distinguish 
guesses from actual information. 

Granick (27) used “index” items, made up of imaginary facts 
concerning which the pupil could not have any information. The 
pupils were asked to indicate with a “G” the items on which they 
had guessed. In view of the fact that less than half of the index 
items answered were recorded as guesses, the conclusion is drawn 
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that the pupil is not capable of determining by introspection the 
amount of guessing. 

As a further interesting outcome of the study, it was found that 
in the plausible but false items, the “true” responses outnumbered 
the “false” by three to two. The ratio of “true” to “ false” in 
pupils’ guesses has been further studied by Fritz (25) and 
Leker (37). Fritz used 194 college students who were given 58 
statements prepared from the Encyclopedia Medica, so technical that 
any response was a sheer guess. Almost 61 per cent of the state- 
ments were marked “true.” Leker kept a record of the true-false 
statements marked by 40 juniors and seniors in the Porto Rico 
Polytechnic Institute. He found that students guessed in accordance 
with predetermined tendencies. While a strong tendency to mark 
statements true was apparent for the group as a whole, a difference 
in this respect obtained among the students. Consequently, the score 
for a given student will depend somewhat on the proportion of true 
and false statements in the test. 

The investigations arranged to determine the advisability of 
correcting the scores on the true-false and multiple-choice tests are 
summarized in Tables I and II. In nearly every case, the comparison 
involves the reliability and validity of the scores resulting when the 





formulae R — and R are used, where “ R” represents number 


\ 
n-1 
right and “n” is the number of possible responses (in the case of 
the true-false, “‘n” is 2, and the first formula becomes R—W). 

A variety of criteria have been used for validity. Arnold (1), 
DeGraff (20), Charles (14), and Ruch and Stoddard (60) used 
recall tests covering the same subject-matter as the true-false and 
multiple-choice tests. Paterson and Langlie (49) used average 
scholarship as the criterion. Wood (75) used a vocabulary test in 
French, term average in Law, and the average of all other medical 
courses in Anatomy. Martin (41) used teachers’ marks and test 
scores. Brinkley (9) used test scores, teachers’ and pupils’ judg- 
ments and class marks. 

The reliability is usually determined by correlating the scores on 
the odd items with those on the even items, with or without the 
Spearman-Brown correction. 

The trend of the results seems to indicate that correction for 
guessing raises the validity of the scores and lowers the reliability. 
While there is a general agreement on these points, the differences 
obtained by the two scoring formulae are in most cases very small. 
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When true-false tests were first used in the classroom there was 
a tendency to direct students to guess on all items they did not know. 
Studies made by Ruch and DeGraff (59) and Foster and Ruch (24) 
comparing true-false tests with different instructions show that the 
reliability and validity of these tests are slightly higher when the 
instructions are “ not to guess” than with the directions “ to guess.” 

A number of attempts have been made to devise a scoring 
formula that will have a greater validity and reliability than either 
“R” or R—-—. 

n-1 

was first suggested by Thurstone (65). 


The method of partial and multiple correlation 


Staffelbach (63), using as 


a criterion of validity seven consecutive semester marks in junior 
high school social sciences, endeavored to devise a plan for including 


TABLE I 
StupiEs CONCERNED WITH THE CORRECTION OF TEST ScorRES FOR GUESSING 


Grade Number Compari- 
Level of Cases sons? 


Adult 70 V 
College 174 D 


Biog. Types of 
No. Date Test 1 
77 21 AA 

72 ‘ao TF 


Subject Matter 
Intelligence 
Nonsense 
Synonym 
Antonym 
History 
Intelligence 
History 


High School 808 V 
High School 43 R 
8th Grade 24533 RV 


124 ‘TF 

ey 

25 <F 
MC 


25 
24 


TF 
TF 
MC 
TF 


TF 


TF 
TF 
TF 


MC 
TF 


Psychology 
History and 
Social Science 
French 

Law 
Anatomy 
Geography 
School Law 
History 


Civil 
Gov't 
Educ. 
Psych. 


College 
High School 


Seniors 
College 
College 
High School 
College 


College 


American History High School 


College 


111 
135 


100 


97 
200 


79 


RV 
R 


RV 


RV 
R 


1Key for types of tests: AA=Original Army Alpha; TF=true-false; 
MC=multiple choice; T=Terman Group Intelligence Test. 

2 Key for characteristics compared: V=validity; R=reliability; D=distri- 
bution of scores. 

8 In ten groups. 























: Biog. No. 
: 77 







; 72 
1 9 
56 
59 








49 
60 
75 








1 
24 
76 








58 








41 






21 









mining grades. 


the omitted items in the scoring. 
correlations, he found the proper weights to be 1 for the correct 
items; —1.1 for the incorrect; and .7 for the omitted. 
the formula S=R+.7 O—1:1W. 

Foster and Ruch also attempted to include the omitted items in 
the formula, using the method devised by Thurstone. 
cluded that the method is not well suited to the true-false test, since 
the weights must be worked out for each test separately. 
Wood (76) found the zero-order coefficients of correlation between 
the validity criterion and the numbers of correct, incorrect and 
omitted items, but did not set up a regression equation. 
differ from those of Staffelbach 
number of omitted items and the criterion is negative. 

Peatman (50) reports the use of weights for individual items 
determined by Clark’s (16) formula for obtaining indices of validity. 
He concludes that while the formula may be of some value in decid- 
ing on the usefulness of items in a test, it is not valuable for deter- 
Wheldon and Davies (73) report the use in the Yale 








CoNCLUSIONS CONCERNING THE CORRECTION OF TEST SCORES FOR GUESSING 


Author 
Yerkes 


West 

Brinkley 

Ruch 

Ruch and De Graff 


Paterson and Langlie 
Ruch and Stoddard 
Wood 


Arnold 
Foster and Ruch 
Wood 


Ruch and Charles 


Martin 


Dunlap et al. 
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sy the use of partial and multiple 


This yields 


They con- 


Mrs. 


Her results 
in that the correlation between 


TABLE II 


Correction for Guessing 

Increased the correlation between the true- 
false or same-opposite tests and the total 
scores on Army Alpha 

Is of doubtful value 

Increased validity 

Decreased reliability 

Increased reliability and validity of multiple- 
choice; increased validity and lowered re- 
liability of true-false 

Lowered reliability and validity 

Lowered reliability 

Lowered reliability and increased validity in 
3 of 5 tests; little difference in other two 

Lowered reliability; increased validity 

Over-penalized 

Increased validity of true-false test, 
creased validity of multiple choice 

Increased validity and decreased reliability 
of true-false and 2-response multiple- 
choice. Caused no significant difference 
on 3- and 5-response multiple-choice 

Caused no significant difference in reliability 
or validity 

Lowered reliability when pupils were di- 

rected not to guess 


de- 
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Law School of the formula S=2W-+0, that yields a series of scores 
in which achievement varies inversely with the score, somewhat as 
in golf. 

Rich (53) has set up a scale for scoring the true-false test which 
increases 0 to 100, avoiding negative scores. Miller (45) has devised 
a formula that may be applied to multiple-response types with more 
than one correct response for each item. 


MopIFICATIONS OF THE TRUE-FALSE TEST 


Several writers have described techniques to decrease the factor 
of guessing. Most of these modified forms take longer to administer 
than the ordinary true-false procedure requires. The possibility that 
lengthening the true-false test a corresponding amount would make 
it more reliable and valid than the modified form needs investigation. 

Greene (28) presented each item both as a true and a false state- 
ment, and gave credit for each correct pair. Christensen (15) 
administered a 100 item true-false, and a 100 item, multiple-choice 
test, using the same questions in both forms. To obtain credit on 
an item, both forms must be correctly answered. Fenton and 
Lehmann (23) allowed the pupils to qualify as many statements as 
they pleased. While some advantages for the method were found in 
other respects, it made little difference as far as the scores were con- 
cerned. McCluskey and Curtis (40) and also Barton (4) required 
the pupils to correct the statements they marked false. Miller (44) 
introduced a type of statement that might be true or not according 
to conditions not present in the statement, thus reducing the factor 


of chance to one-third. 


THE ADVISABILITY OF CHANGING RESPONSES 


Several studies have been conducted to indicate how the students 
taking true-false tests should be instructed in the matter of changing 
responses after they are once recorded (28, 30, 19, 33). These 
studies are unanimous in agreeing that changing responses raises the 
score in two-thirds of the cases, and that competent students profit 
more by changing responses than do inferior students. 


ORAL AND MIMEOGRAPHED (OR PRINTED) PRESENTATION 


Knight (34) presented the true-false test to his classes orally. 
Gates (26) presented it both in oral and written form and found the 
two methods equally desirable. The relative merit of oral and 
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mimeographed presentation has been further investigated by 
Irwin (31), Lehmann (35) and Stump (64). 

Irwin used two groups in high school biology of approximately 
equal ability as indicated by class marks. A series of tests was 
presented to each group, in oral form to one group and in mimeo- 
graphed form to the other. He concluded, from the slight difference 
in average scores, that the two methods were equally effective. 

Lehmann compared the reliability coefficients and the number of 
errors on the same test when given to the same pupils in both oral 
and mimeographed form. He found no real difference. Stump gave 
the same test orally and then in written form to the same pupils and 
correlated the scores. He obtained a coefficient of .47 and concluded 
that the extra time required for mimeographing the tests would not 
be warranted by any difference in results. 


A GENERAL EVALUATION OF THE TRUE-FALSE TEST 


On the basis of the studies cited above, a number of statements 
may be made in summary. 

1. The value of the true-false test as a device for measuring 
information has been established. While it has been found to be 
slightly less reliable and valid than the recall and multiple-choice 
tests, it is at least equal in these respects to the essay test. Its value 
as a check on effectiveness of instruction and as a teaching device 
has been pointed out. 

2. While a slight amount of inaccurate information may be fixed 
in the minds of.the pupils by false statements, this is more than 
offset by the positive effect of the true statements. If the papers are 
corrected in class, the positive effect is still more evident. 

3. In the construction of the true-false test, it is necessary to 
give particular attention to language and other peculiarities that may 
provide a clue to the answer for a pupil with insufficient knowledge. 

4. The proportion of correct responses for pupils who claim to 
be certain of their answers is only slightly greater than for pupils 
claiming to have guessed. In general, items that afford difficulty are 
more likely to be guessed “true” than “ false.” 

5. In general, it has been found that the correction for guessing 
by means of the formula “ rights minus wrongs ”’ raises the validity 
of the test and lowers its reliability, as compared with the results 
when “number of items answered correctly” is used as the score. 
However, the differences in reliability and validity are usually very 


slight. 
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6. Several modifications of the true-false test have been proposed 
to decrease the factor of guessing. Most of these take longer to 
administer than does the conventional form. Whether a correspond- 
ing increase in the time allowed for the conventional form would 
make it more valid and reliable than the modified form needs 
investigation. 

7. Competent pupils profit more by changing the responses once 
recorded than do inferior pupils. On the whole, about twice as many 
scores were raised as were lowered by the changes. 

8. The studies dealing with the relative values of oral and 
mimeographed (or printed) presentation of the true-false statements 
are not adequate to warrant a conclusion. 
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BY VERNON JONES ano CLAUDE NEET 
Clark University 


This summary is designed to cover the developments in the field 
of educational tests in the United States during the years 1931 and 
1932. During this period 173 books, articles, and tests have appeared 
which are of sufficient importance to deserve mention. 

It is believed that one major purpose of such a summary as this 
is to furnish students of measurement a complete bibliography of the 
materials of any appreciable significance, and to classify these under 
appropriate headings. Another major purpose is to indicate the 
critical issues and the more important trends in research and inter- 
pretation in the field, so that the non-specialist may see the directions 
in which the work is tending. We shall concern ourselves first with 
the latter objective and shall give as much of the space assigned to us 
to this as is consistent with the inclusion of an extensive bibliography. 


I. GENERAL TRENDS AND CRITICAL ISSUES 


(a) The Converging of Tests upon the Problems of Learning. 
One trend which is unmistakable, during the years under review, is 
that of relating tests to the problems of learning. This is noticeable 
particularly in the construction and use of diagnostic tests. In the 
main, of course, test construction still proceeds on the empirical basis 
but more and more studies are appearing in which an analysis is first 
made of learning errors and various types of deficiencies in the sub- 
ject and then the test is so constructed as to tap the pupil’s knowledge 
or skill at those critical points. Pioneer work in the analysis of errors 
in reading was done quite a number of years ago by such experi- 
menters as Judd, Gray, and Gates, and similar work in mathematics 
was done by Thorndike, Boswell, Symonds, and others. Such work 
has in the past formed the basis for a few tests, such as the Gray 
Reading Test and the Gates Reading Tests, but it appears that this 
excellent method of diagnostic test construction is at present coming 
more and more into favor. Illustrations of such analytical work and 
the relating of it to test construction are found in the work of 
Monroe (92) in reading, MacRae and Uhl (86) in algebra, Hart- 
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man (52) in spelling, and Newland (104) in writing of arabic 
numerals. 

The growth in instructional tests and unit tests also illustrates this 
trend toward the relating of tests to the problems of learning and 
teaching. Unfortunately, however, the construction of such tests 
has proceeded in general on a hit-or-miss basis rather than on an 
analytical one. Typically they consist of a series of questions based 
on a textbook course or a unit of work, and though doubtlessly they 
are of practical value in teaching, they add little or nothing to the 
theory or technique of testing and nothing to the knowledge of the 
nature of the learning process in the subjects involved. 

(b) The Development of New Tests and the Use of Existing 
Ones for Educational Selection. This emphasis is, of course, not 
new, but it has grown with striking speed during the period under 
review. In years past we have been accustomed to speak of educa- 
tional guidance and selection with the emphasis on the guidance of 
the individual. But suddenly the emphasis has shifted, and we see 
forceful groups writing and encouraging cooperative research in the 
direction of selecting individuals for the purposes of society. Dr. 
Paul Monroe (93) in calling an international Conference on Exam- 
inations in London in 1931 stressed the uses of examinations as a 
means of admission to various employments and professions and as 
a “means of social control.’”’ He says that the examination is an 
institution at stake today in social and political discussion as well as 
in education, and he feels that the scientific development of examina- 
tions may in some way be related to the solution of the mounting 
problem of the overproduction of persons trained for professional 
and clerical activities in democratic societies. 

Dr. Ben Wood (171), whose endeavors have been backed by 
the Carnegie Foundation for the Advancement of Teaching and the 
General Education Board, stresses the administrative, including the 
selective, function of tests in higher education. He says that the 
outstanding development of educational tests during the last few 
years has been in “ major strategy as opposed to minor tactics in the 
use of educational measurements.” It seems evident that the trend 
toward the development and use of tests in individual diagnosis and 
teaching is included by him under the “minor tactics,” and the 
“major strategy” has to do with the development and use of tests 
for survey purposes, for evaluating college teaching and college cur- 
ricula, and for selecting entrants and graduates. 

Here we have the most clear-cut issue, the reviewers believe, that 
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has so far developed in the field of educational tests. Shall the major 
trend in tests be in the direction of evaluating the needs, abilities, and 
specific learnings of the individual with a view to encouraging his 
growth to the maximum, or shall tests be developed and used pri- 
marily for evaluating the needs and activities of social groups with a 
view to selecting individuals in light of social demands? This is a 
question to be answered not by science but by philosophy. However, 
the answer which each specialist in measurement gives to it will be of 
interest to the science of measurement because it will determine the 
direction in which his research and interpretations will go. 

(c) The Employing of Improved Testing Techniques in the 
Form of Comprehensive Examinations as a Means of Reducing Some 
of the “Evils” of the Gathering-of-Credits System in College. 
Great impetus was given to this trend by the study of relations of 
secondary and higher education in Pennsylvania, the results of which 
have been emphasized by Learned (71) and McConn (82). There 
were six colleges in Pennsylvania which gave the Carnegie College 
Achievement Test, consisting of about 2,700 items, to all classes from 
freshmen to seniors inclusive. Learned reports that in very few 
academic subjects did the seniors exceed the freshmen and in an 
appreciable number they actually made lower scores. He is inclined 
to attribute this to what he refers to as a system of sacrificing per- 
manence of intellectual equipment to the practical aim of accumulat- 
ing credits. McConn (82) reviews these findings and recommends 
the use of honors courses and carefully constructed comprehensive 
examinations with all students as a means of remedying this situation. 
E. S. Jones (64) in his book on Comprehensive Examinations in 
Colleges agrees with McConn and definitely subscribes to the belief 
that the outcomes which we desire must first be incorporated in or 
encouraged by the examinations which we give. The General Edu- 
cation Board has appropriated $500,000 to be used over a ten-year 
period in the construction of rather comprehensive and standardized 
examinations in the major fields of college instruction. The extensive 
survey and research work which this grant will stimulate in the field 
of college tests will, we believe, serve the two-fold purpose of pre- 
senting further evidence against the assumption of increase in “ gen- 
eral achievement ” under the present system in college, and of making 
test materials readily available from which comprehensive examina- 
tions may be constructed. 

The reviewers are inclined to believe that this trend will lead to 


much controversy. The two major points in the controversy will 
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probably be: (1) What are the aims of college instruction—,“ general 
achievement ” or specific or specialized achievements, and (2) Are 
comprehensive examinations valid for the measurement of the aims 
decided upon? The first is strictly not a problem for measurement 
to decide; the latter is so important and has been emphasized so 
strongly during the period under review that it is being discussed 
separately as our fourth trend. 

(d) The Growing Insistence that Each Educational Test Demon- 
strate Its Validity in the Situation Where It Is Used. In contrast to 
the statistical method of test and test-item validation which hereto- 
fore has been used very widely, Osburn (171), Tyler (159-162), 
Peters (116-118), and Barr (3) have argued for an analytical 
approach. Tyler has written much on this problem, and his funda- 
mental thesis is that measurement experts should not be content for 
their tests merely to measure something reliably, but on the contrary 
they should ever seek to measure reliably something important among 
educational objectives. Peters (116,117) has made a study of the 
relation of standardized tests to educational objectives. He exam- 
ined 183 widely used tests and found that in less than one-third had 
any analytical method been employed in the selection of test items. 
He goes on to intimate that test makers do not know that they are 
measuring what they ought to measure until they analyze their tests 
in light of desirable objectives. The leaders in educational measure- 
ment are not deaf to the very fundamental criticism of those like 
Peters who say that tests are in many cases not measuring what they 
ought to measure, but they are inclined to say that the difficulty is not 
so much one of measurement as it is one of knowing specifically what 
the specialists in objectives would have them measure. Thorndike’s 
view is probably typical of the realists in the field of measurement 
when he says (93, p. 29) “If the educational theorists or teachers 
will state the changes which they wish to make in their pupils so that 
they are identifiable in reality, the science of educational measurement 
can find some means to measure those changes.” However, this 
statement is not a defense of present-day tests or of present-day 
validities. It is rather a challenge to the educational theorists to be 
specific in their analysis of what they wish measured; and it is a call 
to measurement to test those outcomes validly. There used to exist 
the assumption among many that if a test measured something 
reliably it was therefore worthwhile, and if a test-item showed con- 
sistency with the battery as a whole it was valid. This assumption 
is proving particularly unsatisfactory in the field of educational tests. 
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There is a growing demand that educational tests shall prove that 
they are measuring accurately the desired outcomes at each level of 
achievement where they purport to test—from the primary grades to 
the university. The higher we extend measurement up the educa- 
tional ladder the more do we have to meet the demands of the sub- 
ject-matter specialist that our measurements prove their validity. 
Neither the “minor tactics” nor the “major strategy,” to which 
Wood refers, will work here: only the cold facts based upon patient 
research of the subject-matter expert and the measurement specialist 
will satisfy this insistent demand that the tests demonstrate that they 
are measuring something educationally worthwhile. 


II. CLASSIFICATION OF BooKs AND ARTICLES 


Due to limitation of space it will not be possible to indicate the 
nature of the contributions made in the various studies contained in 
the bibliography. But, for the convenience of those who may be 
interested in studying the developments in particular directions, a 
classification of the material under more or less conventional headings 
is given. Under each heading an attempt has been made to mention 
the more significant articles first. 


1. General Summaries, Reviews, Textbooks, and Articles Point- 
ing to Broad Trends in Research and Interpretation 
(a) Summaries and Reviews: 171, 65; (b) Textbooks: 15, 153, 
106, 172, 126, 165; (c) Trends: 93, 81, 66, 150, 96, 109, 80, 149, 21, 
113, 26; (d) Broad Problem of Objectives and Test Validity: 93 
(pp. 22-30), 117, 116, 161, 160, 158, 46, 69, 64. 
2. Development and Use of Tests for Survey and Experimental 
Purposes 
(a) Survey and Experimental: 71, 63, 82, 37, 78, 16, 44, 49, 57; 
(b) New Tests—Batteries: 23, 42, 121; Reading: 47, 40, 102; 
English: 75, 101, 85, 137, 146, 17; Geography: 169; Science: 145, 
88; Languages: 91; History and Civics: 77, 8, 87; Educational 
Psychology: 110, 111, 123; Stenography: 7; Physical Efficiency 
Tests: a great extension of measurement in this field, but advances 
cannot be adequately summarized here—see 9, 129 and American 
Physical Education Association Quarterly, volumes 2 and 3. 
3. Development and Use of Tests for Educational Diagnosis 
and for Teaching “ 
(a) Diagnosis: 15, 92, 94, 14, 86, 152, 34, 52, 155, 104; 
(b) Instruction: 125, 74, 55, 156, 50, 105, 134, 135, 6, 43, 1, 68, 119. 
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4, Development and Use of Tests for Prognosis, Guidance and 
Selection 

(a) General: 72, 144, 124, 36; (b) Dependability of Marks and of 
Comprehensive Examinations: 163, 28, 127, 151; (c) Use of Interest 
Questionnaires and Rating Scales: 148, 67, 33, 142, 133, 53; 
(d) Prognosis and Selection in Medicine: 99, 97, 98; (e) Prognosis 
in Music: 168, 100, 154, 79, 95, 90; (f) Prognosis in Miscellaneous 
Subjects: 10, 19, 20, 48, 115, 51. 


5. Evaluating and Improving Teachers’ Classroom Tests and 
Marks 
(a) General: 64, 112, 138, 139, 118, 54, 5, 39, 157, 164, 131; 
(b) Short-Answer Tests: 140, 58, 61, 166, 70, 167, 45, 120, 147, 
4, 29. 


6. Intensive Study of Measuring Instruments and Techniques 

(a) General Problems of Validity and Reliability: 159, 162, 161, 
171 (pp. 5-20), 116, 38, 152, 112, 41; (b) Study of Current Tests: 
168, 2, 130, 56, 84, 11, 13, 18; (c) A.Q. and Similar Techniques: 60, 
27, 83, 132, 12; (d) Techniques for Standardization: 59, 35, 30, 31, 
73, 76, 41, 22, 107, 167, 122, 24, 136, 143, 114, 25, 128, 32, 108, 62, 
170, 103, 173. 


7. Bibliographies 
141, 171, 65, 89. 


BIBLIOGRAPHY 


. ANDREWS, G. G., and Anperson, H. R., Achievement Tests in World 
History. Boston: Ginn, 1931. 

. BALLENGER, H. L., Validation of the Iowa Elementary Language Tests. 
Univ. Iowa Stud., 1931, 6, No. 3. Pp. 59. 

. Barr, A. S., Measurements and Progressive Education. J. Educ. Res., 
1930, 22, 317-319. 

. Barton, W. A., Improving the True-False Examination. School & Soc., 
1931, 34, 544-546. 

. Bayes, E. E., and Benet, R. C., A Study of Comparative Validity as 
Shown by a Group of Objective Tests. J. Educ. Res., 1931, 23, 8-16. 

. BrsHop, F., and Irwin, M. E., Instructional Tests in Plane Geometry. 
Yonkers, N. Y.: World Book Co., 1932. 

. Bracxstone, E. G., and McLauGuutn, M. W., Blackstone Stenographic 
Proficiency Tests. Stenography Test. Yonkers, N. Y.: World Book 
Co., 1932. 

. Bowman, L. G., Bowman United States History Test. Bloomington, IIL: 
Pub. School Publ. Co., 1932. 











524 VERNON JONES AND CLAUDE NEET 

9. Brace, D. K., The Development of Measures of Pupil Achievement in 
Physical Education. Res. Quar. Amer. Phys. Educ. Asso., 1931, 2, 
No. 3, 32-37. 

10. Brewer, J. M., Self-Measuring Scale for Achievement and Experience in 
Work and Education. Chicago: Stoelting, 1931. 

11. Broom, M. E., Doveras, J., and Rupp, M., On the Validity of Silent 
Reading Tests. J. Appl. Psychol., 1931, 15, 35-38. 

12. Brown, A. W., and Linn, C., School Achievement in Relation to Mental 
Age—a Comparative Study. J. Educ. Psychol., 1931, 22, 561-576. 

13. Brown, C. M., An Evaluation of the Minnesota Rating Scale for Home 
Economics Teachers. Minneapolis: Univ. Minn. Press, 1931. Pp. 29. 

14. Brueckner, L. J., and Etwett, M., Reliability of Diagnosis of Error in 
Multiplication of Fractions. J. Educ. Res., 1932, 26, 175-185. 

15. BruecKNner, L. J., and Metsy, E. O., Diagnostic and Remedial Teaching. 
Boston: Houghton Mifflin, 1931. Pp. xviii +598. 

16. CatpweLL, F. F., A Comparison of Blind and Seeing Children in Certain 
Educational Abilities. NNew York: Amer. Found. for Blind, 1932. 
Pp. 28. 

17. Carrott, H. A., A Standardized Test of Prose Appreciation for High 
School Pupils. J. Educ. Psychol., 1932, 23, 401-410, 604-606. 

18. Carrot, H. A., A Preliminary Report on a Study of the Interrelationship 
of Certain Appreciations. J. Educ. Psychol., 1932, 23, 505-510. 

19. CasTLEMAN, N., Grover, M., and Moore, H., An Aptitude Test for High 
School Teachers. J. Appl. Psychol., 1931, 15, 208-213. 

20. Cueypieur, F. D., Use of Placement Tests in Modern Languages at the 
University of Wisconsin. Mod. Lang. J., 1931, 15, 262-280. 

21. Cuism, L. L., Classification and Promotion Practices in Elementary 
School. Elem. School J., 1932, 33, 89-91. 

22. Coox, W. W., The Measurement of General Spelling Ability Involving 
Controlled Comparisons Between Techniques. Univ. Iowa Stud.: Stud. 
Educ., 1932, 6, No. 6. Pp. 112. 

23. CooperaATIVE Test Service. Educ. Rec., 1933, 14, 115-119. 

24. Corey, S. M., The Effect of Weighting Exercises in a New Type Exam- 
ination. J. Educ. Psychol., 1930, 21, 383-385. 

25. Courtis, S. A., Maturation Units for the Measurement of Growth. School 
& Soc., 1929, 30, 683-690. 

26. Courtis, S. A., Measurement of the Efficiency of Teaching. Educ. Admin. 
& Supervis., 1932, 18, 401-412. 

27. Coy, G. L., A Study of Various Factors Which Influence the Use of the 
Accomplishment Quotient as a Measure of Teaching Efficiency. J. 
Educ. Res., 1930, 21, 29-42. 

28. Crawrorp, A. B., and Burnuam, P. S., Entrance Examinations and Col- 
lege Achievement. School & Soc., 1932, 36, 344-352, 378-384. 

29. Crawrorp, C. C., Preference Versus Performance in Taking True-False 
Tests. School Rev., 1932, 40, 138-141. 

30. Cureton, E. E., Errors of Measurement and Correlation. Arch. Psychol., 

1931, No. 125. 

















SRO RNR I MTD 





31. 


36. 


37. 


38. 


39, 


40. 


41, 


49, 


50. 


51. 


52. 





EDUCATIONAL TESTS 525 

Cureton, E. E., and Duntap, J. W., A Nomograph for Estimating a 
Reliability Coefficient by the Spearman-Brown Formula and for Com- 
puting Its Probable Error. J. Educ. Psychol., 1930, 21, 68-69. 


. Cureton, E. E., and Duntap, J. W., Scoring the Rearrangement or Con- 


tinuity Test. School Rev., 1930, 38, 613-616. 


. Drake, C. A., A Study of an Interest Test and an Affectivity Test in 


Forecasting Freshman Success in College. Teach. Coll. Contrib. Educ., 
1931, No. 504. Pp. vi+60. 


T 


. Dvorak, A., and Encisu, E., The Efficiency of Remedial Teaching. 


Educ. Admin. & Supervis., 1932, 18, 466-471. 


. Epcerton, H. A., A Table for Finding the Probable Error of R Obtained 


by the Use of the Spearman-Brown Formula (N=2). J. Appl. 
Psychol., 1930, 14, 296-302. 

Epmiston, R. W., Method of Improving Prediction. School & Soc., 1931, 
33, 411-414. 

Eetts, W. C., and Fox, C. S., Sex Differences in Mathematical Achieve- 
ment of Junior College Students. J. Educ. Psychol., 1932, 23, 381-386. 

Ecan, E. P., The Effect of Fore-Exercises on Test Reliability. Nashville, 
Tenn.: George Peabody Coll., 1932. Pp. 37. 

Euricy, A. C., Four Types of Examinations Compared and Evaluated. J. 
Educ. Psychol., 1931, 22, 268-278. 

Euricn, A. C., A Method of Measuring Retention in Reading. J. Educ. 
Res., 1931, 24, 202-208. 

Foran, T. G., A Note on Methods of Measuring Reliability. J. Educ. 
Psychol., 1931, 22, 383-387. 


. Gates, A. I., et al., The Modern School Achievement Tests. New York: 


Bur. Publ., Teach. Coll., Columbia Univ. 


. Grenn, E. R., and Gruenserc, B. C., Instructional Tests in General 


Science. Yonkers, N. Y.: World Book Co., 1932. 


. Grace, A. G., Individual Differences in Adults. J. Educ. Psychol., 1932, 


23, 179-186. 


. Granicu, L., Technique for Experimentation on Guessing in Objective 


Tests. J. Educ. Psychol., 1931, 22, 145-156. 


. Greene, E. B., The Retention of Information Learned in College Courses. 


J. Educ. Res., 1931, 24, 262-273. 


. Greeng, H. A., Jorcensen, A. N., and Ketiey, V. H., Jowa Silent Read- 


ing Test. Yonkers, N. Y.: World Book Co., 1931. 


. Grover, C. C., Results of an Experiment in Predicting Success in First 


Year Algebra in Two Oakland Junior High Schools. J. Educ. Psychol., 
1932, 23, 309-314. 

Gummer, W. S., The Ohio Survey of English Usage. Columbus: Ohio 
State Dept. Educ., 1931. 

Gutter, W. S., Remediation of College Freshmen in Sentence Structure. 
J. Educ. Res., 1932, 26, 110-115. 

Guttrorp, J. P., and Guiirorp, R. B., A Prognostic Test for Students in 
Design. J. Appl. Psychol., 1931, 15, 335-345. 

Hartmann, G. W., The Relative Influence of Visual and Auditory Factors 

in Spelling Ability. J. Educ. Psychol., 1931, 22, 691-699. 





VERNON JONES AND CLAUDE NEET 


. Hartson, L. D., Validation of the Rating Scales Used with Candidates for 
Admission to Oberlin College. School & Soc., 1932, 36, 413-416. 

. Hemman, J. D., Reliability of College Teachers Classroom Tests. Educ. 
Admin. & Supervis., 1931, 17, 535-543. 

. Hertzperc, O. E., Hemman, J. D., and Levenpercer, H. W., The Value 
of Objective Tests as Teaching Devices in Educational Psychology 
Classes. J. Educ. Psychol., 1932, 23, 371-380. 

. Hevner, K., A Study of Tests for Appreciation of Music. J. Appl. 
Psychol., 1931, 15, 575-583. 

. Howenstein, A. E., Reports of the Seventh, Eighth, and Ninth Annual 
Nation-Wide Testing Programs. Bloomington, Ill.: Pub. School Publ. 
Co., 1930, 1931, 1932. 

. Hotms, G., and Hemereper, E., A Statistical Study of a New Type of 
Objective Examination Question. J. Educ. Res., 1931, 24, 286-292. 

. Houzincer, K. J., Reliability of a Single Test Item. J. Educ. Psychol., 
1932, 23, 411-417. 

. HurraKker, C. L., The Probable Error of the Accomplishment Quotient. 
J. Educ. Psychol., 1930, 21, 550-551. 

. Hurp, A. W., Comparisons of Short Answer and Multiple Choice Tests 
Covering Identical Subject Content. J. Educ. Res., 1932, 26, 28-30. 

. Joun, L., A Comparison of Four Methods of Scoring the Continuity Test. 
School Rev., 1930, 38, 617-621. 

. Jounston, J. B., et al., The 1932 College Sophomore Testing Program. 
Educ. Rec., 1932, 13, 290-343. 

. Jones, E. S., Comprehensive Examinations in American Colleges. New 
York, Macmillan, 1933. Pp. xix+436. 

. Jones, V., and Croox, M., Educational Tests. Psychol. Bull., 1932, 29, 
120-146. 

. Jupp, C. H., The Eastbourne Conference. Harvard Teach. Rec., 1932, 2, 
147-154. 

. Kine, L. H., Mental and Interest Tests, Their Evaluation and Comparative 
Effectiveness as Factors of Prognosis in Secondary Education. Teach. 
Coll. Contrib. Educ., 1931, No. 444. 

. Kirkpatrick, J. E., and Greene, H. A., Pupil-Teacher Handbook of 
Objective Test Exercises in High School Physics. Bloomington, IIL: 
Pub. School Publ. Co., 1932-33. 

. Krey, A. C., and Westey, E. B., Does the New-Type Test Measure 
Results. Hist. Outlook, 1932, 23, 7-21. 

. Kruecer, W. C. F., An Experimental Study of Certain Phases of a True- 
False Test. J. Educ. Psychol., 1932, 23, 81-91. 

. LEarNED, W. S., Study of the Relation of Secondary and Higher Educa- 
tion in Pennsylvania. 25th Ann. Rep. Carnegie Found. Adv. Teach., 
1930, 51-66. 

. Learnep, W. S., Admission to College. Educ. Rec., 1933, 14, 23-48. 

. Lenrz, T. F., Hmsustetn, B., and Frvcu, F. A., Evaluation of Methods 
of Evaluating Test Items. J. Educ. Psychol., 1932, 23, 344-350. 





EDUCATIONAL TESTS 527 


. Leonarp, J. P., The Use of Practice Exercises in the Teaching of Capitali- 
sation and Punctuation. New York: Bur. Publ., Teach. Coll., Columbia 
Univ., 1930. Pp. 78. 

. Leonarp, J. P., Leonard Diagnostic Test in Punctuation and Capitalization. 
Yonkers, N. Y.: World Book Co., 1931. 

. Ltncotn, E. A., The Variability of Reliability Coefficients. J. Educ. 
Psychol., 1932, 23, 11-14. 

. Lrnpgutst, E. F., Form of the American History Examination of the 
Codperative Test Service. Educ. Rec., 1931, 12, 459-475. 

. Lunn, F. H., Sex Differences in Type of Educational Mastery. J. Educ. 
Psychol., 1932, 23, 321-330. 

. McCartuy, D., A Study of the Seashore Measures of Musical Talent 
J. Appl. Psychol., 1930, 14, 437-455. 

. McCrure, W. E., The Status of Psychological Testing in Large City 
Public School Systems. J. Appl. Psychol., 1930, 14, 486-496. 

. McConn, C. M., The Co-operative Test Service. J. Higher Educ., 1931, 
2, 225-232. 

. McConn, M., How Much Do College Students Learn? No. Amer. Rev., 
1931, 232, 446-454. 

. McCrory, J. R., The Reliability of the Accomplishment Quotient. J. 
Educ. Res., 1932, 25, 27-39. 

. McEtwee, E. W., Standardization of the Stenquist Mechanical Assembling 
Test Series III. J. Educ. Psychol., 1932, 23, 451-454. 

. McKee, J. H., Wyxorr, G. S., and Remmers, H. H., The Purdue Place- 
ment Test in English. Boston: Houghton Mifflin. 

. MacRag, M., and Unt, W. L., Types of Errors and Remedial Work in 
the Fundamental Processes of Algebra. J. Educ. Res., 1932, 26, 12-21. 
. Macruper, F. A., Caampers, M. M., and Crinton, R. J., American Civics 
and Government Test for High Schools and Colleges. Bloomington, 
Ill.: Pub. School Publ. Co., 1931. 

. Matin, J. E., Diagnostic Test in the Mechanics of High School Chemistry. 
Bloomington, Ill.: Pub. School Publ. Co., 1932. 

. Matort, J. E., and Secer, D., Tests in Commercial Education—an Anno- 
tated List. Washington: U. S. Dept. Interior, Dept. Educ., Circular, 
1932, No. 56. Pp. 11. 

. Merry, R. V., Department of Special Studies: Adapting the Seashore 
Musical Talent Tests for Use with Blind Pupils. Teach. Forum 
for Instructors of Blind Children, 1931, 3, 15-19. 

. Mircnet, H., and Purrer, A. A., French Verb and Idiom Achievement 
Tests. Boston: Heath, 1931. 

. Monroe, M., Children Who Cannot Read; the Analysis of Reading Disa- 
bilities and the Use of Diagnostic Tests in the Instruction of Retarded 
Readers. Chicago: Univ. Chicago Press, 1932. Pp. xvi+2085. 

. Monroe, P. (Ed.), Conference on Examinations (Eastbourne, England). 
New York: Bur. Publ., Teach. Coll., Columbia Univ., 1931. Pp. 376. 
. Monroe, W. S., and Encecnart, M. D., A Critical Summary of Research 
Relating to the Teaching of Arithmetic. Educ. Res. Bull., 1931, No. 58. 





VERNON JONES AND CLAUDE NEET 


. More, G. V. D., Prognostic Testing in Music on the College Level. J. 
Educ. Res., 1932, 26, 199-212. 

. Mort, P. R., and Gates, A. L., Acceptable Uses of Achievement Tests. 
A Manual for Test Users. New York: Bur. Publ., Teach. Coll. 
Columbia Univ., 1932. Pp. ix+85. 

. Moss, F. A., Aptitude Tests in Selecting Medical Students. Person. J., 
1931, 10, 79-94. 

. Moss, F. A., Preliminary Report on Medical Aptitude Tests for 1931-32. 
School & Soc., 1931, 34, 132-134. 

. Moss, F. A., Scholastic Aptitude Tests for Medical Students. Report for 
1932. J. Asso. Med. Coll., 1933, Jan. 

. MurseELt, J. L., Measuring Musical Ability and Achievement. A Study 
of the Correlations of Seashore Test Scores and Other Variables. J. 
Educ. Res., 1932, 25, 116-126. 

. Netson, M. J., Nelson’s High School English Test. Boston: Houghton 
Mifflin. 

. Netson, M. J., The Nelson Silent Reading Test, for Grades III to VIII 
Inclusive. Boston: Houghton Mifflin, 1932. 4 

. Nesmiru, R. W., Scoring the Continuity Test. School Rev., 1929, 37, 
764-766. 

. NEwLanp, T. E., Chart for Diagnosis of Illegibilities in Written Arabic 
Numerals. Bloomington, Ill.: Pub. School Publ. Co., 1931. 

. Nirenecker, E. A. (Dir.), New York Spelling Tests, Series PW, and 
New York Survey Tests in Arithmetic. New York: Bur. Ref., Res. & 
Stat., Board Educ., 1932. Publ. No. 26; 25. 

. Ovett, C. W., Educational Measurements in High School. New York: 
Century, 1930. Pp. xiv+641. 

. Overt, C. W., Further Data Concerning the Effect of Weighting Exercises 
in New-Type Examinations. J. Educ. Psychol., 1931, 22, 700-704. 

. Overt, C. W., Still More About Scoring Rearrangement or Continuity 
Tests. School Rev., 1931, 39, 542-546. 

. Overt, C. W., Educational Measurement in the Secondary School. J. 
Educ. Res., 1932, 26, 81-89. 

. Ovett, C. W., A Test in Educational Measurements. School & Soc., 1932, 
35, 810-814. 

. Ovett, C. W., Odell Standard Achievement Test on Educational Measure- 
ment. Bloomington, IIll.: Pub. School Publ. Co., 1932. 

. Orteans, J. B., and Symonps, P. M., The Comparative Reliabilities of 
Standardized and Teacher-Made Achievement Tests When Given in 
the Middle of the Year. J. Educ. Res., 1932, 25, 127-128. 

. Orts, A. I., Fallacious Arguments Regarding Ability Grouping. Child- 
hood Educ., 1931, 8, 171-180. 

. Peatman, J. G., The Influence of Weighted True-False Test Scores on 
Grades. J. Educ. Psychol., 1930, 21, 143-147. 

. Perry, W. M., Prognosis of Abilities to Solve Exercises in Geometry. 
J. Educ. Psychol., 1931, 22, 604-609. 





116. 


117. 


118. 


119. 


123. 


124. 


129. 
130. 
131. 
132. 


133. 


134, 


135. 








EDUCATIONAL TESTS 529 


Perers, C. C., and Atrman, J. E., A Critical Study of the Content of 
Standardized Tests in American History. J. Educ. Res., 1931, 23, 153- 
161. 

Peters, C. C., and Crosstey, E., The Relation of Standardized Tests 10 
Educational Objectives. Chap. 3 of 2nd Yrbk. Nat. Soc. Stud. Educ. 
Sociol., Objective of Educ., 1929. Pp. 148-159. 

Perers, C. C., and Martz, H. B., A Study of the Validity of Various 
Types of Examinations. School & Soc., 1931, 33, 336-338. 

Peterson, H. J., and Pererson, J. C., Instructional Tests in Psychology; 
for Use with the New Self-Instructor and Tester. Manhattan, Kan.: 
J. C. Peterson, 1932. 


. Puiturrs, D. P., Comparison of the Two-Response and Dictated Recall 


Types of Spelling Tests. J. Educ. Res., 1931, 23, 17-24. 


. Prntner, R., Pintner Educational Achievement Tests, for Grades IV to 


VIII. New York: Bur. Publ., Teach. Coll., Columbia Univ., 1931. 


. Portuorr, E. F., and Barnett, N. E., Comparison of Marks Based upon 


Weighted and Unweighted Items in New Type Examination. J. Educ. 
Psychol., 1932, 23, 92-98. 

Potruorr, E. F., and Corey, S. M., A Standardized Test in Educational 
Psychology. Bloomington, Ill.: Pub. School Publ. Co., 1931. 

Pressey, L. C., Report on an Attempt at the Prognosis of Unusually 
Good and Unusually Poor Scholastic Work. J. Educ. Psychol., 1932, 
23, 387-389. 


. Pressey, L. C., and Pressey, S. L., Training College Freshmen to Read. 


J. Educ. Res., 1930, 21, 203-211. 


. Pressey, S. L., and Pressey, L. C., Introduction to the Use of Standard 


Tests. (Rev. ed.) Yonkers, N. Y.: World Book Co., 1931. 


. ressEy, S. L., Pressey, L. C., and Barnes, E. J., The Final Ordeal. 


J. Higher Educ., 1932, 3, 261-264. 


. Remmers, H. H., The Equivalence of Judgments to Test Items in the 


Sense of the Spearman-Brown Formula. J. Educ. Psychol., 1931, 22, 
66-71. 

Rocers, F. R., Physical Capacity Tests. New York: Barnes, 1931. Pp. 
viii +53. 

Rucu, G. M., and Meyer, S. H., Comparative Merits of Physics Tests. 
School Sci. & Math., 1931, 31, 676-680. 

Russet, C., Classroom Scaler and Grader. Boston: Ginn, 1931. Pp. 16. 

St. Joun, C. W., Educational Achievement in Relation to Intelligence, as 
Shown by Teachers’ Marks, Promotions, and Scores in Standard Tests 
in Certain Elementary Grades. Harvard Stud. Educ., 1930, No. 15. 

St.. Joun, C. W., Some Evidences of Effects of the Pupil’s Classroom 
Adjustment upon His Achievement Test Performance. J. Educ. 
Psychol., 1932, 23, 489-504. 

SANGREN, P. V., and Remy, A., Teachers Handbook and Manual of 
Instructions for Sangren-Reidy Instructional Tests in Arithmetic, for 
Grades 2-8 Inclusive. Bloomington, Ill.: Pub. School Publ. Co., 1931. 

SANGREN, P. V., and Wison, M. C., Instructional Tests in Reading. 

Blomington, Ill.: Pub. School Publ. Co., 1932. 








VERNON JONES AND CLAUDE NEET 


. Scates, D. E., and Norrsrncer, F. R., Factors Which Determine the 
Effectiveness of Weighting. J. Educ. Res., 1931, 24, 280-285. 

. SHEPHERD, J. W., The Shepherd English Test. Boston: Houghton 
Mifflin, 1932. 

. Stus, V. M., The Objectivity, Reliability, and Validity of an Essay Exam- 
ination Graded by Rating. J. Educ. Res., 1931, 24, 216-223. 

. Stms, V. M., Essay Examination Questions Classified on the Basis of 
Objectivity. School & Soc., 1932, 35, 100-102. 

. Stms, V. M., and Knox, L. B., The Reliability and Validity of Multiple- 
Response Tests When Presented Orally. J. Educ. Psychol., 1932, 23, 
656-662. 

. SmitH, H. L., and Wricut, W. W., The Second Revision of the Bibli- 
ography of Educational Measurements. Bull. School Educ., Ind. Univ., 
1933. 

. Sorenson, H., Some Factors for Pupil Control Measured and Related. 
J. Educ. Psychol., 1932, 23, 1-10. 

. STAFFELBACH, E. H., Weighting Responses in True-False Examinations. 
J. Educ. Psychol., 1930, 21, 136-139. 

. Stenguist, J. L., Baltimore Constantly Checks Results. J. Educ., 1930, 
112, 183-185. . 
. Stewart, A. W., and AsusaucuH, E. J., Stewart-Ashbaugh Physics Test; 
Mechanics and Heat; Electricity; Sound and Light. Bloomington, IIl.: 
Pub. School Publ. Co., 1931. 

. STRATTON, C., Connor, W. L., and Repmonp, F. A., The Cleveland Eng- 
lish Composition and Grammar Test. Boston: Houghton Mifflin, 1931, 
. Stump, N. F., Listening Versus Reading Method in the True-False 
Examination. J. Appl. Psychol., 1931, 15, 555-562. 

. Symonps, P. M., Tests and Interest Questionnaires in the Guidance of 
High School Boys. New York: Bur. Publ., Teach. Coll., Columbia 
Univ., 1930. Pp. 61. 

. Symonps, P. M., Shall the I.Q. be Used for Sectioning in the High 
School? J. Educ. Res., 1931, 24, 138-140. 

. Symonps, P. M., The Testing Program for the High School. School 
Rev., 1932, 40, 97-108. 

. Taytor, J. C., The Reliability of Quarterly Marks in the Seventh Grade 
of Junior High Schools, Together with the Value of Certain Standard 
Tests in Predicting Them. Johns Hopkins Univ. Stud. Educ., 1931, 
No. 17. Pp. 54. 

. TuHurstoneE, T. G., The Difficulty of a Test and Its Diagnostic Value. J. 
Educ. Psychol., 1932, 23, 335-343. 

. Trecs, E. W., Tests and Measurements for Teachers. Boston: Houghton 
Mifflin, 1931. Pp. xx+470. 

. Tirson, L. M., A Study of the Predictive Value of Musical Talent Tests 
for Teacher Training Purposes. Teach. Coll. J., Ind. State Teach. 
Coll., 1931, 3, 101-128. 

. Traxier, A. E., The Correlation Between Reading Rate and Comprehen- 
sion. J. Educ. Res., 1932, 26, 97-101. 





EDUCATIONAL TESTS 531 


. TurNEY, A. H., The Effect of Frequent Short Objective Tests. School 
& Soc., 1931, 33, 760-762. 

. Turney, A. H., The Cumulative Reliability of Frequent Short Objective 
Tests. J. Educ. Res., 1932, 25, 290-295. 

. TyLerR, R. W., What High School Pupils Forget. Educ. Res. Bull., 1930, 
9, 490-492. 

. TyterR, R. W., A Generalized Technique for Constructing Achievement 
Tests. Educ. Res. Bull., 1931, 10, 199-208. 

. TyLerR, R. W., More Valid Measurements of College Work. J. Nat. Educ. 
Asso., 1931, 20, 327-328. 

. TyLer, R. W., Measuring the Results of College Instruction. Educ. Res. 
Bull., 1932, 11, 253-260. 

. TyLter, R. W., Making a Co-operative Test Service Effective. Educ. Res. 
Bull., 1932, 11, 287-292. 

. VALENTINE, C. W., The Reliability of Examinations. London: Univ. 
London Press, 1932. Pp. 195. 

. Weaver, R. B., and Traxter, A. L., Essay Examinations and Objective 
Tests in United States History in the Junior High School. School 
Rev., 1931, 39, 689-695. 

. Wess, L. W., and SHorwett, A. M., Standard Tests in the Elementary 
School. New York: Ray Long and R. R. Smith, 1932. Pp. xiv+532. 
. WEIDEMANN, C. C., Omission as a Specific Determiner in the True-False 
Examination. J. Educ. Psychol., 1931, 22, 435-439. 

. WHELDOoN, C. H., Jr., and Davies, F. J. J., Method for Judging the Dis- 
crimination of Individual Questions on True-False Examinations. J. 
Educ. Psychol., 1931, 22, 290-306. 

. Wuittey, M. T., A Comparison of the Seashore and the Kwalwasser- 
Dykema Music Tests. Teach. Coll. Rec., 1932, 33, 731-751. 

. WriEepEFELD, M. T., and Wattuer, E. C., Wiedefeld-Walther Geography 
Test. Yonkers, N. Y.: World Book Co., 1931. 


70. Witson, H. E., Further Comments on the Scoring of Continuity Tests. 


School Rev., 1930, 38, 115-123. 

. Woop, B. D., Ospurn, W. J., Rucu, G. M., Trasue, M. R., Stenouist, 
J. L., et al., Educational Tests and Their Uses. Rev. Educ. Res., 1933, 
3, 1-80. 

. Woopy, C., and Sancren, P. V., Administration of the Testing Program. 
Yonkers, N. Y.: World Book Co. Pp. x+397. 

. Worcester, D. A., Still Further Comments on the Scoring of the Con- 
tinuity Test. School Rev., 1930, 38, 462-466. 





BOOK REVIEWS 


BrigHaAM, Cart C. A Study of Error. New York: College 

Entrance Examination Board, 1932. Pp. xiii+384. 

This volume, a model of the typist’s and lithographer’s art, is a 
detailed “ summary and evaluation of methods used in six years of 
study of the scholastic aptitude test of the College Entrance Exam- 
ination Board.” During the past twenty years tests of many kinds 
have accumulated in great numbers, too often without any adequate 
analysis. This work represents the first thorough-going analysis of 
tests and test results and may be considered a true pioneer study in a 
field which must be developed before the testing movement can make 
further progress as a definite contribution to educational practice. 

It is essential that one read the author’s preface first. Here a 
brief history of the work of the research committee is given, together 
with the points of view and special activities of individual members. 
Here too is made clear the reason for the theoretical treatment of 
Meaning and Symbols given in the first two chapters. 

One might have expected to find in these chapters a summary of 
other applications and analyses of scholastic aptitude tests. Instead, 
one finds what is much more satisfactory; an exposition, for the 
most part by quotations from original sources, of the development 
of basic psychological concepts regarding the mental life. The con- 
trasting viewpoints of Titchener (structural, experimental and 
existential) and Dewey (societal, philosophical and pragmatic) are 
presented and their relations to behaviorism and the Gestalt are sug- 
gested. The author then reviews, with apparent approval, the inter- 
pretation of Ogden and Richards, and Spearman’s laws of cognition. 
He has only the highest praise for the expressed concepts of Lorimer 
and Holt. 

Sample pages and materials from each of the types of tests are 
presented in enough detail to understand the significance of the 
analysis of data. These include: Judgments of Similarity between 
Drawings Portraying Emotional Expression, Synonyms, Logical 
Inference, Verbal Section of the Scholastic Aptitude Test, Number, 
Princeton Test of 1925, and Spatial Relations Test. 

The major part of the report consists of a detailed analytical study 
of the test items. The techniques employed constitute a major con- 
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tribution of the volume. One agrees with the concluding statement, 
“ The procedures laid down appear, in general, to be adequate for the 
study of problems of any type whatsoever, and the systematic cata- 
loguing of error should provide the basis needed for a genuine 
science of education” (p. 302). The studies indicate that error is 
not capricious, but orderly. In some instances biserial r is used to 
find the correlation of items with the criterion. The difficulty of 
items were computed. Correlations and intercorrelations are in 
certain cases accompanied by a study of tetrad differences which 
leads the author to conclude that there is “ indisputable clear presence 
of group factors over and above those common to all measures which 
cause correlation between certain of the measures” (p. 38). These 
factors he regards not as “ psychological realities, or existences, but 
merely as certain decimal multipliers.” The test items are in general 
found to be very stable in point of difficulty with the various groups 
tested and he advises that “ future work should be directed toward 
the discovery of new types of items and studies of the inter-relation- 
ships of various tasks ” (p. 302). 

The Appendices include sections on Administration of the 
Scholastic Aptitude Tests and Statistical Methods as well as reprints 
of the six annual reports of the Commission on Scholastic Aptitude 
Tests, 1926-31. 

The report has many suggestive psychological and educational 
implications which can be appreciated only by a thorough study of 
its pages. Pau V. WEstT. 
School of Education, New York University. ‘ 

























Vernon, M. D. The Experimental Study of Reading. Cambridge: 

Cambridge University Press, 1931. Pp. xv+190. 

To the teacher and clinician interested in scientific findings upon 
which to develop methods of teaching reading and to the student 
seeking a short-cut to acquaintance with this field of research, 
Vernon’s book should be very helpful. It is not, however, an i 
exhaustive treatise. Of the manifold phases to the psychology of 
reading, the author offers a “ concise account” of the experimental : 
literature on eye movements, visual perception, legibility, and a too . 
brief discussion of reading and reading disabilities of children. . 

In general, the organization of the contents is commendable, but a 
the tendency toward logical completeness has perhaps resulted in the b 
inclusion of an unnecessary amount of general related facts. A com- 
prehensive historical survey of the developments in methods of 
observing and registering eye movements, followed by an exposition 
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of the types of eye movements and the sensations accompanying 
ocular behavior, introduces the subject. The description of eye 
movements in reading, which is the most important and original 
section, includes a characterization of the typical sequences of fixation 
pauses acquired by the mature reader and the variations iti this 
“specially adapted series” imposed by several conditioning factors. 
Printing arrangement, age and maturity, purpose and attitude of 
reader, nature of reading content, and individual differences in motor 
habits are all found to influence the pattern of eye movements. The 
adult percept in reading is shown to be “a vague, blurred, visual 
impression, filled out with the help of a variety of thought processes 
which are concurrent with reading’; perception in the child, it is 
noted, is “ predominantly of a subjective nature, in that the part 
played by the objective stimulus is secondary to that of familiarity 
and interest.” The section on typographical factors deals with the 
methods and results in the attempt to achieve optimum standards for 
legibility of print. 

Especially interesting are the author’s own experimental contri- 
butions and analyses of subjective factors affecting eye movement 
patterns. More space devoted to this type of material at the cost of 
abbreviating or omitting much of the review of tachistoscopic and 
introspective studies on adult visual perception would have resulted 
in no loss of general merit. There is no consistent attempt to inte- 
grate the various factors abstracted for description (motor, percep- 
tual, and central) and to look upon reading as a unitary process. To 
be sure, the influence of subjective, presumably central, factors upon 
the oculomotor processes is pointed out, but at times the motor aspect 
is emphasized and eye movements in reading are regarded as though 
independent motor habits, even “automatic.” The “well adapted” 
variations in eye movement patterns conditioned by changes in pur- 
pose and attitude of reader and by differences in content of material, 
however, indicate that the oculomotor behavior patterns are secondary 
and highly flexible in their adjustments to the central processes of 
assimilation and association. 

The data on which conclusions are based are not evaluated sta- 
tistically in terms of number of cases and reliability and validity of 
measures. This, however, is not necessarily a criticism of the 
author’s work; it means that many of the conclusions are to be 


regarded as tentative, offering hypotheses for research problems of 


verification. ARDEN N. FRANDSEN. 


University of Minnesota. 
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DunLaP, Knicut. Habits: Their Making and Unmaking. New 

York: Liveright, Inc., 1932. Pp. x+3206. 

The main thesis of the book is developed in the second chapter 
entitled, “The Fundamental Principles of Learning,” viz., the 
process by which we learn is not the act that is learned. The activi- 
ties that are performed during the initial periods of practice differ 
from those in the final practice period, and these final acts are the 
ones that are established or learned, and their establishment is a 
function of the preceding practice periods. 

There can be no objection to this principle. By definition, learn- 
ing involves progressive modification of what we do during practice. 
It involves progressive improvement due to practice. Without modi- 
fication and improvement during practice there would be no learning. 
In learning to typewrite, we measure the progressive change in our 
behavior in respect to speed and accuracy. The successive activities 
involved in learning differ at least in these two respects. Other 
changes and modifications are present and might be measured, such 
as ease and grace of performance, total energy output impinging upon 
the keys, etc. Dunlap’s principle is obviously contained in the very 
definition of learning. 

Performance changes during learning and in a variety of ways. 
Are all of these equally significant, or does their importance vary 
with our purposes? Which of these is Dunlap concerned with? He 
is primarily concerned with changes in the behavior pattern—the 
sensory, ideational, emotional, and motor composition of the practice 
activities. Two types of pattern modification are given: (1) In 
learning a maze, the activity is modified by the dropping out of the 
retracings and the entrances into the blinds, though Dunlap illustrates 
the principle in terms of dart throwing. There is nothing novel in 
this type of modification. (2) One of the distinctive features of 
Dunlap’s position is the second type of modification. Learning 
always, or at least usually, involves thinking, and it is primarily 
through these activities that learning is achieved. Three modes of 
thinking may be present—anticipatory, retrospective, and the simpler 
type of ‘ mere imagination.’ 

Anticipatory thinking is involved in desire and purpose, and it is 
these that make learning possible or at least effective. As the act is 
learned and becomes automatic, these ideational activities tend to 
drop out. We are told that the final act that is learned is present in 
the earlier stages of practice only as an ideal. The activities involved 
in learning are thus different from that which is learned. 
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Retrospective thinking is involved in the recognition of errors 
and successes, and this recognition is effective in achieving progress, 
though he does not say that it is essential to learning. 

We are told that all animal forms can learn. There is some indi- 
cation that plants can learn. Learning begins at birth, and he 
indicates his belief that learning occurs during foetal development. 
Are ideational activities involved in all these cases of learning? The 
author inclines to such a view while admitting that such a doctrine is 
pretty much of an assumption, and of course there is nothing wrong 
about making assumptions, if we are aware of the fact and do not 
deceive our readers. Dunlap, however, does not assume that antici- 
patory and retrospective thinking are necessarily involved in animal 
learning. It is the simpler type of ‘mere imagination’ that is 
probably present in the rat. This concept of ‘mere imagination’ 
is not adequately defined, however. 

There are certain questions of terminology. Let one type a given 
page of copy once a day for thirty days. If there is learning, these 
practice activities will differ in certain respects, but also they will be 
alike in other respects. Any two activities are both alike and dif- 
ferent. If they were not different, they would not be two, and if 
they were not alike they would not be classed as acts. Are we deal- 
ing with thirty different acts, or with modifications of the same act? 
If they are different acts, no act is ever repeated, and hence we do 
not learn by repeating an act. It is practice that is repeated, and this 
always changes the act. No two things can ever be absolutely the 
same, or otherwise they would not be two. Things are the same 
when they are alike in respect to some group of characteristics which 
we regard as important and significant, and what these are may vary 
with circumstances. From one standpoint it is perfectly proper to 
speak of one act which has been repeated thirty times and been 
modified as a consequence. If we regard the pattern as the signifi- 
cant feature, they may be properly regarded as thirty acts rather than 
as one. Dunlap is primarily interested in the differences of the 
practice activities rather than in their likenesses. 

Dunlap is sceptical of physiological theories of learning, and of 
all physiological explanations of mental phenomena. Practice modi- 
fies the pattern, and hence a given pattern of activity is not engrained 
and fixated by repetition. The old conception of the deepening of a 
definite system of brain paths, the strengthening of a specific system 
of neural bonds is thus easily disposed of. He notes that this con- 
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ception was too much dominated by analogies from inanimate objects, 
as the creasing of a sheet of paper by folding. 

“Much that passes for brain physiology is merely a mass of 
psychological facts of perception, feeling and thinking, translated 
into terms of the anatomy of the brain in accordance with various 
arbitrary assumptions.” “There is nothing especially ‘ physiologi- 
cal’ about the work, and in spite of Pavlov’s insistence that he is 
studying the brain, it is difficult to see that he is, in the more striking 
parts of the investigations, studying the brain in any way different 
from that in which Small, Thorndike, and Watson, in their experi- 
ments on animal learning, may be said to have been studying the 
brain.” Pavlov described learning in new terms such as ‘ condition- 
ing,’ and his followers have used these terms as explanations of 
learning. 

Practice changes the act, and hence we come to the doctrine that 
bad habits can be eliminated by practicing them. The method is dis- 
cussed in three chapters. I do not understand that this is the only 
way of eliminating or breaking an undesirable habit. Rather it is 
the most feasible method for certain habits. The method has been 
found to be successful with stammering, tics, finger nail biting, 
thumb-sucking, but has not been so successful with masturbation 
and homosexual practices for obvious reasons. It is also discussed 
in reference to emotional habits, and various social traits and 
attitudes. 

The conditions for practice are explicitly stated on page 196, and 
need not be repeated. It is admitted that there are difficulties in 
fulfilling the requisite conditions. We are told that the method can 
not be successfully used by the layman, the mere laboratory psycholo- 
gist, the ordinary run of ‘clinical’ psychologist, the medical prac- 
titioner, the psychiatrist, nor the psychologist who is tainted with 
psychoanalysis, behavioristic or other ‘ school’ theories. Even then 
the residual group of psychologists must take a year to acquire the 
method, but must have some ability to make contacts and the wisdom 
to analyze cases and profit by their work. This is quite a formidable 
list of limitations, and it is quite likely to induce some scepticism as 
to the pragmatic worth of the method. The reviewer has always been 
inclined to believe that there is something to the method, but he feels 
the need for a more detailed account of the causal mechanisms 
involved. Let us grant that practice changes the act, but why does 
it disrupt and eliminate the bad habit, rather than achieve some other 
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undesirable result? Why does practice always modify an act in some 
given direction ? 

What does Dunlap mean by practice? Certainly there must be a 
difference between ‘ practicing ’ an act and ‘ performing’ it, or other- 
wise the stutterer would speedily cure himself. I suspect that there 
is also a difference between practicing an act to get rid of it and 
practicing in order to fixate it. It is practice with the former intent 
that Dunlap emphasizes in his therapeutical procedure. Is it the 
‘intent’ that is the effective factor? 

The book was probably not intended as a comprehensive manual 
on learning. The author was primarily interested in developing and 
emphasizing certain features of learning which he thinks had been 
previously neglected. The book is well written and the account is 
interesting. The author has a distinctive slant on all topics which he 
discusses, and his mode of expression is unique and refreshing. The 
book contains many illuminating discussions. Particular mention 
may be made of his treatment of the concept of the physiological 
reflex on pages 64-66, much of the chapter on Remembering and 
Forgetting, and many of his critical comments in the final chapter on 
Learning Ability and Intelligence. The appendix contains an excel- 
lent classified bibliography on learning. 

No review, according to custom, is complete without a few criti- 
cisms. In discussing guidance on page 107, he tells us that we might 
lead an animal through the maze. It would be difficult to guide a rat, 
but one can easily guide a dog or goat, but it has not been experi- 
mentally done. As a substitute, rats have been prevented from enter- 
ing the culs de sac by means of doors. Again in discussing transfer,. 
he says, on page 121, that the learning of one maze by a rat seriously 
inhibits the learning of a second maze. Transfer does occur in 
human learning, whereas in the rat nothing but inhibition results. 
These statements do not harmonize with the writer’s knowledge of 
the experimental literature on rat learning. There is one experiment 
in which rats were led through the maze, if I know what ‘led’ 
means, and in respect to transfer the mastery of one problem some- 
times facilitates and sometimes inhibits the learning of a second 


problem. 
Harvey A. Carr. 
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